A Complete Teaching Guide From Language to Vision to Custom Models
Why This Article Exists
Most machine learning tutorials start with math. Gradient descent. Loss functions. Backpropagation. Tensors. You spend three weeks on theory and still have not built anything useful.
This guide takes a different approach. AWS has done the hard work of training and hosting powerful ML models. Your job is to understand what each service does, when to use it, how to call it, and critically what breaks without it. By the end of this guide you will understand how to apply machine learning to real problems without needing a PhD, and you will understand the landscape well enough to know when to use a managed service and when you need something custom.
Every concept follows the same structure: the problem, the service, how to use it, what happens when you run it, the limitations, and what to reach for when those limits are hit.
Table of Contents
- What Machine Learning Actually Is and Why AWS Changed the Game
- Amazon Comprehend - Natural Language Understanding
- Amazon Kendra - Intelligent Enterprise Search
- Amazon Lex - Conversational Interfaces
- Amazon Polly - Text to Speech
- Amazon Rekognition - Image and Video Analysis
- Amazon Textract - Document Intelligence
- Amazon Transcribe - Speech to Text
- Amazon Translate - Language Translation
- Amazon Forecast - Time-Series Prediction
- Amazon Fraud Detector - Fraud Detection at Scale
- Amazon SageMaker - Build Your Own Models
- How the Services Connect - Real Architecture Patterns
- Choosing the Right Service
What Machine Learning Actually Is and Why AWS Changed the Game
The Traditional Problem
Machine learning is the practice of training a system on data so it can make predictions or decisions on new data it has never seen. Recognising that an image contains a dog. Understanding that "I want to cancel" means negative sentiment. Predicting that sales will drop next quarter based on historical patterns.
Before managed ML services, building any of this required:
- Data scientists to design and train models
- Significant compute infrastructure for training
- Engineers to deploy and serve the model
- Ongoing work to retrain as data changed
- Months of elapsed time before anything was in production
Most organisations could not justify that investment. Machine learning was a capability only large technology companies could afford.
What AWS Changed
AWS took the most common ML use cases language, vision, speech, prediction trained models at a scale no individual organisation could match, and exposed them as API calls. You send data in, you get a result back. You do not manage the model, the compute, or the training pipeline.
This is the core idea behind AWS's managed ML services. They are not building blocks for ML engineers. They are capabilities for application developers.
The services in this guide fall into two categories:
Pre-trained AI services Models AWS trained for you. You call an API. You get a result. Comprehend, Kendra, Lex, Polly, Rekognition, Textract, Transcribe, Translate, Forecast, and Fraud Detector all fall here.
Custom ML platform SageMaker. You bring your data, you design your model, you control the training. SageMaker handles the infrastructure.
Understanding which category a service falls into is the first decision you make in any ML architecture.
Amazon Comprehend Natural Language Understanding
The Problem
You have 50,000 customer support tickets. You want to know: which ones are urgent? Which products are being complained about? What is the overall sentiment trend this month? Are there any personally identifiable details that should be redacted before the data is shared?
Reading 50,000 tickets is not possible. Building a custom NLP model to analyse them requires ML expertise you may not have. You need a way to extract meaning from text at scale.
What It Is
Amazon Comprehend is a natural language processing service. It reads text and extracts structure and meaning from it. It does not just search for keywords it understands language the way a human reader would, identifying what something means, not just what words are present.
What It Can Do
Sentiment Analysis - Determines whether text is Positive, Negative, Neutral, or Mixed. Works at document level and at individual sentence level.
Entity Recognition - Identifies named entities: people, organisations, locations, dates, quantities, events, products. Also supports custom entity types you define.
Key Phrase Extraction - Pulls out the most meaningful phrases from a document.
Language Detection - Identifies which language a document is written in, with a confidence score.
PII Detection and Redaction - Identifies personally identifiable information names, addresses, phone numbers, credit card numbers and can redact them from the text.
Topic Modelling - Groups a large collection of documents into topics based on content similarity. Useful for understanding themes across thousands of documents without reading them.
Custom Classification - You train a classifier on your own labelled data to categorise documents into your own categories. For example, routing support tickets to the right department.
Custom Entity Recognition - You define new entity types specific to your domain. A pharmaceutical company might want to extract drug names and dosages that the default model does not know about.
Code Example
import boto3
comprehend = boto3.client('comprehend', region_name='us-east-1')
text = """
I've been a customer for three years and the recent update completely
broke the mobile app. I can't log in at all. This is unacceptable.
My account ID is ACC-88721 and I need this resolved today.
"""
# Sentiment analysis
sentiment = comprehend.detect_sentiment(
Text=text,
LanguageCode='en'
)
print(f"Sentiment: {sentiment['Sentiment']}")
print(f"Scores: {sentiment['SentimentScore']}")
# Entity detection
entities = comprehend.detect_entities(
Text=text,
LanguageCode='en'
)
for entity in entities['Entities']:
print(f" {entity['Type']}: {entity['Text']} ({entity['Score']:.2f})")
# PII detection
pii = comprehend.detect_pii_entities(
Text=text,
LanguageCode='en'
)
for item in pii['Entities']:
print(f" PII - {item['Type']}: characters {item['BeginOffset']}-{item['EndOffset']}")
What Would Happen When You Run It
The sentiment call returns:
Sentiment: NEGATIVE
Scores: {'Positive': 0.01, 'Negative': 0.94, 'Neutral': 0.03, 'Mixed': 0.02}
The entity call returns the account ID as an OTHER entity, and potentially "mobile app" as a product reference. The PII call flags the account ID depending on its format, it may match patterns for financial or account identifiers.
You now have machine-readable signal from human-written text. Route this ticket to priority support. Tag it with the product affected. Redact the account ID before storing in your analytics warehouse.
At scale, run this across 50,000 tickets in a batch job using start_sentiment_detection_job for async processing, and you have actionable intelligence in minutes.
Limitations
Comprehend works on text that is already extracted. If your customer feedback is in PDF files, image scans, or audio recordings, you need another service to extract the text first Textract for documents, Transcribe for audio. Comprehend then processes the output.
The pre-trained models cover common entity types and general sentiment. For domain-specific language medical records, legal documents, financial instruments the default models may produce lower accuracy. This is where Custom Entity Recognition and Custom Classification become necessary, and where the line between pre-trained services and SageMaker begins to blur.
Amazon Kendra - Intelligent Enterprise Search
The Problem
Your company has a SharePoint full of internal documentation. A knowledge base in Confluence. A set of PDFs in S3. An HR portal with policy documents. An employee asks: "What is the parental leave policy for contractors?"
A keyword search returns 47 documents containing the words "parental", "leave", and "contractors." The employee reads through them. The actual answer was on page 8 of a PDF that did not even contain the word "parental" it used "maternity" and "paternity" instead.
Traditional search finds documents. It does not answer questions.
What It Is
Amazon Kendra is an intelligent enterprise search service powered by machine learning. It does not return a list of documents. It reads your documents, understands them, and returns a direct answer to a natural language question along with the source document and the exact passage where the answer was found.
How It Works
You connect Kendra to your data sources S3, SharePoint, Confluence, Salesforce, ServiceNow, websites, databases. Kendra indexes the content, understanding not just words but meaning. When a user asks a question, Kendra searches by semantic similarity, not just keyword overlap.
Code Example
import boto3
kendra = boto3.client('kendra', region_name='us-east-1')
INDEX_ID = 'your-kendra-index-id'
response = kendra.query(
IndexId=INDEX_ID,
QueryText="What is the parental leave policy for contractors?",
QueryResultTypeFilter='ANSWER'
)
# Kendra returns ranked results with confidence scores
for result in response['ResultItems']:
print(f"Type: {result['Type']}")
print(f"Score: {result['ScoreAttributes']['ScoreConfidence']}")
if result['Type'] == 'ANSWER':
print(f"Answer: {result['AdditionalAttributes'][0]['Value']['TextWithHighlightsValue']['Text']}")
if result['Type'] == 'DOCUMENT':
print(f"Document: {result['DocumentTitle']['Text']}")
print(f"Excerpt: {result['DocumentExcerpt']['Text']}")
print("---")
What Would Happen When You Run It
Kendra does not return 47 documents. It returns the direct answer: the specific paragraph from your HR policy document that describes contractor parental leave. It highlights the relevant sentences. It cites the source document and the page. The answer confidence is scored.
The employee gets an answer in 3 seconds instead of 20 minutes.
What It Is Not
Kendra is not a general-purpose web search engine. It is designed for your private enterprise content. It requires you to set up an index, connect data sources, and pay per query.
Limitations
Kendra answers questions based on what is in your documents. If the answer is not in any indexed document, Kendra cannot generate an answer it returns the most relevant documents it found. It is a retrieval system, not a generation system.
For generating answers that synthesise information across multiple documents or reason through problems, the architecture evolves toward Retrieval Augmented Generation combining Kendra's retrieval capability with a generative model. But that is beyond the scope of this guide.
Kendra also requires the content to be text-readable. Scanned image PDFs must be processed through Textract before Kendra can index them.
Amazon Lex - Conversational Interfaces
The Problem
You want to build a chatbot for your banking application. A customer can type: "Transfer $200 to John" or "What is my balance?" or "I want to dispute a charge from last Tuesday." The bot needs to understand the intent, extract the relevant information, ask clarifying questions if information is missing, and trigger the right backend action.
Building this from scratch means building an intent classifier, a slot extractor, a dialogue manager, and a conversation state machine. That is months of work.
What It Is
Amazon Lex is the service that powers Amazon Alexa. It provides a fully managed conversational AI service for building chat and voice interfaces. You define intents (what the user wants to do) and slots (the pieces of information needed to fulfill that intent), and Lex handles the conversation flow.
Core Concepts
Intent - A specific action the user wants to take. TransferMoney, CheckBalance, DisputeCharge.
Slot - A piece of data required to fulfil the intent. For TransferMoney: Amount, RecipientName, FromAccount.
Slot Type - The data type and validation for a slot. Built-in slot types handle dates, times, currencies, numbers. Custom slot types handle domain-specific values like account names.
Utterances - Example phrases the user might say to trigger an intent. Lex uses these to train its intent classifier. You do not need hundreds of examples Lex generalises from a smaller set.
Fulfilment - What happens when all required slots are filled. Typically a Lambda function that executes the business logic.
Code Example - Defining an Intent (via SDK)
import boto3
lex = boto3.client('lexv2-models', region_name='us-east-1')
# Create a bot
bot = lex.create_bot(
botName='BankingAssistant',
description='Banking chatbot for balance and transfers',
roleArn='arn:aws:iam::123456789012:role/LexBotRole',
dataPrivacy={'childDirected': False},
idleSessionTTLInSeconds=300
)
BOT_ID = bot['botId']
# Intents and slots are typically configured in the Console or via
# CloudFormation. This shows the runtime interaction:
lex_runtime = boto3.client('lexv2-runtime', region_name='us-east-1')
response = lex_runtime.recognize_text(
botId=BOT_ID,
botAliasId='TSTALIASID',
localeId='en_US',
sessionId='user-session-001',
text='I want to transfer 200 dollars to Sarah'
)
print(f"Intent: {response['sessionState']['intent']['name']}")
print(f"State: {response['sessionState']['intent']['state']}")
for slot_name, slot_value in response['sessionState']['intent']['slots'].items():
if slot_value:
print(f"Slot '{slot_name}': {slot_value['value']['interpretedValue']}")
# Lex might respond with a clarifying question if a slot is missing:
for message in response.get('messages', []):
print(f"Bot: {message['content']}")
What Would Happen When You Run It
Input: "I want to transfer 200 dollars to Sarah"
Lex identifies intent TransferMoney. It extracts slot Amount = 200 and RecipientName = Sarah. But FromAccount is missing. Lex generates the clarifying question: "Which account would you like to transfer from?" and waits for the user's response.
When all slots are filled, Lex calls the Lambda fulfilment function with the complete slot values. The Lambda executes the transfer and returns a confirmation message to Lex, which delivers it to the user.
The conversation is stateful. Lex tracks which slots are filled across multiple turns.
Limitations
Lex manages dialogue but does not understand context the way large language models do. It is intent-and-slot based. A conversation that goes off-script a user who changes topic mid-conversation, asks abstract questions, or gives ambiguous answers requires careful handling in your slot validation Lambda or will confuse the bot.
For open-ended conversations that do not map cleanly to a fixed set of intents, Lex becomes unwieldy. That is a use case for a different generation of models entirely.
Amazon Polly - Text to Speech
The Problem
You are building a navigation app. You need to convert turn-by-turn directions into spoken audio. You have a content platform and want to offer audio versions of your articles. You are building a call centre IVR system and need dynamic audio responses.
Recording a human voice actor for every possible phrase is impractical. Pre-recorded audio cannot handle dynamic content like addresses, names, and real-time data.
What It Is
Amazon Polly converts text into lifelike speech. It uses deep learning to generate audio that sounds human, supports dozens of languages and voices, and can produce speech in multiple styles standard, neural, and long-form.
Voice Types
Standard voices - Fast, cheap, good quality. Use for most applications.
Neural voices - Higher quality, more natural-sounding, slightly higher cost. Recommended for customer-facing applications where voice quality affects perception.
Long-form voices - Optimised for long articles and narration. More natural cadence across extended speech.
Code Example
import boto3
polly = boto3.client('polly', region_name='us-east-1')
# Basic speech synthesis
response = polly.synthesize_speech(
Text='Turn left onto Victoria Island Expressway in 200 metres.',
OutputFormat='mp3',
VoiceId='Joanna',
Engine='neural'
)
# Save audio to file
with open('navigation.mp3', 'wb') as f:
f.write(response['AudioStream'].read())
# Using SSML for advanced control
ssml_text = """
<speak>
Welcome to Torilo Academy.
<break time="500ms"/>
Today's session begins at <say-as interpret-as="time" format="hms12">9:00am</say-as>.
<emphasis level="strong">Please be on time.</emphasis>
<break time="300ms"/>
The instructor for today is
<phoneme alphabet="ipa" ph="onjɛdɪkɑtʃɪ">Onyedikachi</phoneme>.
</speak>
"""
response = polly.synthesize_speech(
Text=ssml_text,
TextType='ssml',
OutputFormat='mp3',
VoiceId='Matthew',
Engine='neural'
)
with open('announcement.mp3', 'wb') as f:
f.write(response['AudioStream'].read())
# For long content, use async synthesis
response = polly.start_speech_synthesis_task(
Text=long_article_text,
OutputFormat='mp3',
VoiceId='Joanna',
Engine='neural',
OutputS3BucketName='my-audio-bucket',
OutputS3KeyPrefix='articles/'
)
task_id = response['SynthesisTask']['TaskId']
SSML - Speech Synthesis Markup Language
SSML is how you control the spoken output beyond the defaults. You can:
- Add pauses with
<break> - Control emphasis with
<emphasis> - Specify how to read numbers, dates, and times with
<say-as> - Control pronunciation with
<phoneme> - Adjust speaking rate and pitch with
<prosody>
This is important for names, addresses, abbreviations, and any content where the default pronunciation would sound wrong or unnatural.
What Would Happen When You Run It
The synthesize_speech call returns an audio stream immediately. For short text, the latency is low enough for real-time applications. For long content, start_speech_synthesis_task runs asynchronously and writes the output to S3.
Limitations
Polly generates speech. It does not understand speech. If a user speaks to your application, you need a different service to convert their speech back to text before Polly can respond. That is Transcribe covered in section 8.
Also, neural voices are not available in all regions. Check the regional availability before designing your architecture around a specific neural voice.
Amazon Rekognition - Image and Video Analysis
The Problem
You have a platform where users upload photos. You need to detect inappropriate content before it is published. You have thousands of product images and need to tag them automatically. You operate a physical premises and need to detect when people enter restricted areas. You process ID documents and need to verify that a submitted selfie matches the ID photo.
Doing any of this with traditional programming is either impossible or produces unreliable results. Machine learning is the only practical approach.
What It Is
Amazon Rekognition is a computer vision service. It analyses images and video and returns structured information about what it finds — objects, people, text, faces, scenes, and potentially unsafe content.
It works without any training. You do not configure it. You call it with an image and it returns what it detects.
What It Can Do
Object and Scene Detection - Identifies what is in an image: "car", "person", "beach", "office", each with a confidence score.
Face Detection - Detects faces and returns attributes: approximate age range, gender presentation, emotions (happy, sad, surprised, angry), whether the person is wearing glasses or a mask, whether eyes are open.
Face Comparison - Compares two faces and returns a similarity score. Used for identity verification does this selfie match this ID document?
Face Search - Searches a collection of stored faces to find a match. Used for access control, finding a person across a video archive.
Celebrity Recognition - Identifies well-known public figures in images.
Text in Images - Extracts printed text from images road signs, product labels, memes, whiteboards. Not the same as Textract (which handles documents) — Rekognition handles text in natural scene images.
Content Moderation - Detects nudity, graphic violence, visually disturbing content, and other categories of unsafe content. Returns a confidence score per category.
PPE Detection - Detects whether people in images are wearing personal protective equipment: hard hats, face masks, gloves, vests.
Video Analysis - All of the above, applied to video. You submit a video, and Rekognition returns time-stamped results when a specific face appeared, when unsafe content occurred at minute 3:42.
Code Example
import boto3
import json
rekognition = boto3.client('rekognition', region_name='us-east-1')
# Object and scene detection
with open('image.jpg', 'rb') as image_file:
image_bytes = image_file.read()
labels = rekognition.detect_labels(
Image={'Bytes': image_bytes},
MaxLabels=20,
MinConfidence=75
)
print("Detected labels:")
for label in labels['Labels']:
print(f" {label['Name']}: {label['Confidence']:.1f}%")
if label.get('Parents'):
parents = [p['Name'] for p in label['Parents']]
print(f" Parents: {', '.join(parents)}")
# Content moderation
moderation = rekognition.detect_moderation_labels(
Image={'Bytes': image_bytes},
MinConfidence=70
)
if moderation['ModerationLabels']:
print("CONTENT WARNING:")
for label in moderation['ModerationLabels']:
print(f" {label['Name']} ({label['ParentName']}): {label['Confidence']:.1f}%")
else:
print("Content: Safe")
# Face comparison — identity verification
with open('id_photo.jpg', 'rb') as f:
id_photo = f.read()
with open('selfie.jpg', 'rb') as f:
selfie = f.read()
comparison = rekognition.compare_faces(
SourceImage={'Bytes': id_photo},
TargetImage={'Bytes': selfie},
SimilarityThreshold=80
)
if comparison['FaceMatches']:
similarity = comparison['FaceMatches'][0]['Similarity']
print(f"Face match: {similarity:.1f}% similarity")
print("Identity verification: PASS" if similarity > 90 else "Identity verification: REVIEW REQUIRED")
else:
print("Identity verification: FAIL — no matching face found")
# Using S3 reference instead of bytes (better for large images)
labels_from_s3 = rekognition.detect_labels(
Image={
'S3Object': {
'Bucket': 'my-images-bucket',
'Name': 'uploads/user-photo.jpg'
}
}
)
What Would Happen When You Run It
For an image of a construction site, detect_labels returns: "Construction Site" (99%), "Person" (98%), "Hardhat" (94%), "Safety Vest" (87%), "Machinery" (82%). Cross-referenced with PPE detection, you know whether the workers are properly equipped.
For a user-uploaded photo that contains nudity, detect_moderation_labels returns the specific categories with confidence scores. Your application can automatically reject the upload and log the attempt.
For face comparison between an ID photo and a selfie, you get a similarity percentage. Above 99% high confidence match. Between 80-99% match but flag for human review. Below 80% fail.
Limitations
Rekognition is powerful on static, well-lit, high-resolution images. Accuracy decreases with poor lighting, motion blur, occlusion, and images where faces are small or at extreme angles.
Rekognition can detect that text appears in an image, but for reading structured text from documents invoices, forms, tables Textract is the right tool. Rekognition is for scene text; Textract is for document text.
Custom Labels Rekognition's feature for training on your own image categories extends the service to domain-specific use cases (identifying specific product defects, company-specific objects) but requires labelled training data.
Amazon Textract - Document Intelligence
The Problem
You receive thousands of invoices per month, all as PDFs or scanned images. Each invoice has a different layout different vendors, different formats. You need to extract: vendor name, invoice number, line items, amounts, and due dates. You want this data in a structured format you can load into your accounting system.
A standard PDF parser can extract text if the PDF is text-based. But scanned invoices are images. And even text-based PDFs don't know that "Total Due" and "$4,250.00" on the same line are semantically related. You need a system that understands document structure, not just text content.
What It Is
Amazon Textract is an ML service that extracts text, structure, and data from documents. It goes beyond OCR (optical character recognition). It understands forms, tables, key-value pairs, and the spatial relationships between elements on a page.
What It Can Do
Raw Text Extraction - Extracts all text from a document, preserving reading order.
Form Extraction - Identifies key-value pairs. "Invoice Number: INV-2024-0042" becomes a structured pair: key = "Invoice Number", value = "INV-2024-0042".
Table Extraction - Identifies tables and returns their data as structured rows and columns, even if the table spans multiple pages.
Query-Based Extraction - You ask specific questions about the document and Textract finds the answers. "What is the total amount?" "What is the payment due date?" You do not need to know where on the page this information appears.
Signature Detection - Identifies whether a signature is present on a document.
Identity Document Analysis - Specifically trained to extract data from driving licences, passports, and national ID cards.
Code Example
import boto3
import json
textract = boto3.client('textract', region_name='us-east-1')
# For single-page documents — synchronous
with open('invoice.pdf', 'rb') as f:
document_bytes = f.read()
response = textract.analyze_document(
Document={'Bytes': document_bytes},
FeatureTypes=['FORMS', 'TABLES', 'QUERIES'],
QueriesConfig={
'Queries': [
{'Text': 'What is the invoice number?'},
{'Text': 'What is the total amount due?'},
{'Text': 'What is the payment due date?'}
]
}
)
# Process blocks — Textract returns a flat list of Block objects
blocks = response['Blocks']
# Separate by block type
key_value_pairs = {}
tables = []
query_results = {}
for block in blocks:
# Query results
if block['BlockType'] == 'QUERY_RESULT':
for rel in block.get('Relationships', []):
pass # simplified - full processing maps query to result blocks
# Key-value pairs from forms
if block['BlockType'] == 'KEY_VALUE_SET':
if 'KEY' in block.get('EntityTypes', []):
key_text = ''
value_text = ''
# Simplified extraction - production code traverses relationships
print(f"Form field detected")
# Query-based extraction (cleaner API for specific fields)
for block in blocks:
if block['BlockType'] == 'QUERY':
query_text = block['Query']['Text']
# Find associated QUERY_RESULT block via relationships
if 'Relationships' in block:
for rel in block['Relationships']:
if rel['Type'] == 'ANSWER':
for result_id in rel['Ids']:
result_block = next(
b for b in blocks if b['Id'] == result_id
)
print(f"Q: {query_text}")
print(f"A: {result_block.get('Text', 'No answer found')}")
# For multi-page documents — asynchronous
response = textract.start_document_analysis(
DocumentLocation={
'S3Object': {
'Bucket': 'my-documents-bucket',
'Name': 'invoices/invoice-march-2024.pdf'
}
},
FeatureTypes=['FORMS', 'TABLES'],
NotificationChannel={
'SNSTopicArn': 'arn:aws:sns:us-east-1:123456789012:TextractResults',
'RoleArn': 'arn:aws:iam::123456789012:role/TextractRole'
}
)
job_id = response['JobId']
print(f"Async job started: {job_id}")
# Poll with get_document_analysis(JobId=job_id) or wait for SNS notification
What Would Happen When You Run It
For an invoice, Textract returns form fields as structured key-value pairs: {"Invoice Number": "INV-2024-0042", "Vendor": "Acme Supplies Ltd", "Due Date": "30 April 2024"}. The line items table comes back as a 2D array of cells. The query-based extraction returns direct answers to your specific questions.
This structured data flows directly into your accounting system, your ERP, or your reconciliation pipeline no manual data entry required.
The Difference Between Textract and Rekognition's Text Detection
Rekognition's text detection (detect_text) is designed for scene text: reading a sign in a photo, extracting text from a meme, reading a car number plate in an image. It returns raw text strings with bounding box positions.
Textract is designed for documents: it understands that "Amount Due" and "$4,250.00" are a key-value pair, that a grid of cells is a table, and that text on different lines has structural relationships. Always use Textract for documents. Use Rekognition for text that appears incidentally in natural scene images.
Limitations
Textract accuracy depends on document quality. Heavily skewed, very low resolution, or coffee-stained scans will produce errors. Pre-processing images (deskewing, contrast enhancement) before sending to Textract improves results significantly.
Complex multi-column layouts, nested tables, and documents with unusual structures may require post-processing to correctly reassemble the data. The raw Block response from Textract is a flat list your code must traverse the relationship graph to reconstruct the document structure.
Amazon Transcribe - Speech to Text
The Problem
You record every customer support call. You have compliance requirements to retain transcripts. You want to search calls by topic, detect which calls mentioned a specific product, measure how long agents spend talking versus listening, and automatically flag calls where the customer expressed strong negative sentiment.
You have 10,000 hours of audio. You cannot transcribe them manually.
What It Is
Amazon Transcribe is an automatic speech recognition service. It converts spoken audio into text. It supports batch processing of recorded audio files and streaming transcription for real-time applications.
What It Can Do
Batch Transcription - Submit an audio file in S3, get back a transcript. Supports MP3, MP4, WAV, FLAC, OGG, AMR, WebM.
Streaming Transcription - Real-time transcription of a live audio stream. Useful for live captioning, real-time agent assist, and voice interfaces.
Speaker Identification - Distinguishes between multiple speakers in a recording and labels each segment by speaker. Useful for separating agent speech from customer speech in call recordings.
Custom Vocabulary - Teaches Transcribe how to correctly handle domain-specific terms, product names, and proper nouns that may not be in the base model. "Kachi", "Torilo", "cfn-hup", "EKS" — without custom vocabulary, these may be misheard and mistranscribed.
Vocabulary Filtering - Automatically masks profanity and other specified words in the transcript.
Custom Language Models - Train on your own domain-specific text corpus for higher accuracy in specialised fields like medicine, law, or finance.
Medical Transcription - A specialised variant trained on medical terminology and conversation patterns, with HIPAA eligibility.
Call Analytics - A higher-level API specifically for call centre recordings. Returns sentiment per speaker per segment, talk time ratios, interruptions, loudness, non-talk time, and issue detection.
Code Example
import boto3
import time
import json
transcribe = boto3.client('transcribe', region_name='us-east-1')
# Batch transcription
job_name = f"support-call-{int(time.time())}"
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': 's3://my-calls-bucket/recordings/call-20240416.mp3'},
MediaFormat='mp3',
LanguageCode='en-US',
Settings={
'ShowSpeakerLabels': True,
'MaxSpeakerLabels': 2, # Agent and customer
'VocabularyName': 'TechSupportVocabulary'
},
OutputBucketName='my-transcripts-bucket'
)
# Poll for completion
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
job_status = status['TranscriptionJob']['TranscriptionJobStatus']
if job_status == 'COMPLETED':
transcript_uri = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
print(f"Transcript available at: {transcript_uri}")
break
elif job_status == 'FAILED':
print(f"Job failed: {status['TranscriptionJob']['FailureReason']}")
break
print(f"Status: {job_status} — waiting...")
time.sleep(15)
# Call Analytics — higher-level for call centres
transcribe.start_call_analytics_job(
CallAnalyticsJobName=f"analytics-{int(time.time())}",
Media={'MediaFileUri': 's3://my-calls-bucket/recordings/call-20240416.mp3'},
OutputLocation='s3://my-transcripts-bucket/analytics/',
DataAccessRoleArn='arn:aws:iam::123456789012:role/TranscribeRole',
ChannelDefinitions=[
{'ChannelId': 0, 'ParticipantRole': 'AGENT'},
{'ChannelId': 1, 'ParticipantRole': 'CUSTOMER'}
]
)
What Would Happen When You Run It
For a 10-minute support call, the batch job typically completes in 3-5 minutes. The transcript JSON includes the full text with timestamps and, when speaker labels are enabled, each segment tagged with spk_0 or spk_1.
Call Analytics returns far more than a transcript: sentiment scores per speaker per turn, how long each party spoke, how many times either party interrupted the other, and whether the interaction matched any configured issue categories (refund request, technical problem, billing dispute).
This data feeds Comprehend for deeper NLP, feeds your BI dashboards, and feeds your quality assurance workflow.
Limitations
Accuracy depends on audio quality. Background noise, multiple people speaking simultaneously, strong accents, and low bitrate recordings all reduce accuracy. Custom vocabulary helps significantly for domain-specific terms.
Transcribe does not translate. If your call centre serves customers in multiple languages, you need Transcribe to transcribe the audio in the original language, then Translate to convert it to your working language, then Comprehend to analyse the translated text. These services are designed to work together in sequence.
Amazon Translate - Language Translation
The Problem
Your e-commerce platform operates in 15 countries. Product descriptions are written in English. Support tickets arrive in Portuguese, Arabic, Yoruba, French, and Spanish. Your global team shares internal documents. You cannot hire translators for every language pair, and machine translation has historically been poor enough that it creates more confusion than it resolves.
What It Is
Amazon Translate is a neural machine translation service. It uses deep learning to translate text between languages with high accuracy and natural-sounding output.
It supports over 75 languages, handles real-time translation via API call, and supports batch translation of large document collections stored in S3.
Code Example
import boto3
translate = boto3.client('translate', region_name='us-east-1')
# Real-time translation
response = translate.translate_text(
Text="""
The recent software update has caused significant issues with our
production environment. We need immediate assistance to restore service.
""",
SourceLanguageCode='en',
TargetLanguageCode='fr'
)
print(f"Translated text: {response['TranslatedText']}")
print(f"Source: {response['SourceLanguageCode']}")
print(f"Target: {response['TargetLanguageCode']}")
# Auto-detect source language
response = translate.translate_text(
Text="O aplicativo parou de funcionar após a atualização.",
SourceLanguageCode='auto', # Translate detects the language
TargetLanguageCode='en'
)
print(f"Detected source language: {response['SourceLanguageCode']}")
print(f"Translation: {response['TranslatedText']}")
# Custom terminology — preserve your brand names and technical terms
translate.import_terminology(
Name='TechTerminology',
MergeStrategy='OVERWRITE',
TerminologyData={
'File': b"""
en,fr,es,pt
CloudFormation,CloudFormation,CloudFormation,CloudFormation
SageMaker,SageMaker,SageMaker,SageMaker
Lambda function,fonction Lambda,función Lambda,função Lambda
""",
'Format': 'CSV'
}
)
# Use custom terminology in translation
response = translate.translate_text(
Text="Deploy your application using a Lambda function and CloudFormation.",
SourceLanguageCode='en',
TargetLanguageCode='fr',
TerminologyNames=['TechTerminology']
)
# Batch translation of documents in S3
translate.start_text_translation_job(
JobName='ProductDescriptionTranslation',
InputDataConfig={
'S3Uri': 's3://my-content-bucket/product-descriptions/en/',
'ContentType': 'text/plain'
},
OutputDataConfig={
'S3Uri': 's3://my-content-bucket/product-descriptions/'
},
DataAccessRoleArn='arn:aws:iam::123456789012:role/TranslateRole',
SourceLanguageCode='en',
TargetLanguageCodes=['fr', 'es', 'pt', 'ar', 'de', 'ja']
)
What Would Happen When You Run It
The real-time call returns the translated text in milliseconds. The auto-detect variant identifies the source language and translates it. With custom terminology, "Lambda function" is preserved as "fonction Lambda" in French rather than being literally translated to something that loses technical meaning.
The batch job translates an entire folder of English product descriptions into six languages simultaneously, writing the results to your S3 bucket. No human translator involvement. Your marketing team reviews and approves before publishing.
For the support ticket scenario: Transcribe converts the Spanish audio to text, Translate converts the Spanish text to English, Comprehend analyses the English text for sentiment and entities, and your support system routes the ticket to the right queue. Four services, one pipeline, fully automated.
Limitations
Neural machine translation is excellent for formal and semi-formal text. Idioms, colloquialisms, highly technical jargon without custom terminology, and language with deep cultural context may produce output that is grammatically correct but tonally wrong or contextually off.
Custom terminology helps with product names and technical terms but does not teach Translate domain context. For heavily regulated content — legal, medical, financial human review of machine-translated output is still necessary.
Amazon Forecast - Time-Series Prediction
The Problem
You run a retail operation. You need to know: how much stock to order next month? How many staff to schedule next weekend? How much cloud compute capacity to provision for the holiday season?
All of these are time-series forecasting problems predicting a future value based on historical patterns plus additional factors. Doing this manually means spreadsheets, gut feel, and systematic errors. Doing it with a custom statistical model requires a data scientist and months of model development.
What It Is
Amazon Forecast is a fully managed time-series forecasting service. You give it your historical data and, optionally, related data (price changes, promotional events, weather, holidays), and it automatically trains and evaluates multiple forecasting models, selects the best one, and provides predictions with confidence intervals.
Key Concepts
Target Time Series - The data you want to forecast. Sales volume, energy consumption, website traffic, inventory demand.
Related Time Series - Additional time series that correlate with your target. Promotional calendar, pricing history, weather data. These help Forecast understand why the target changed, not just when.
Item Metadata - Attributes of the things you are forecasting. Product category, store location, size tier. These help Forecast generalise across many similar items.
Dataset Group - The container that holds all your data sources.
Predictor - The trained model. Forecast automatically tries AutoML (selecting from ARIMA, ETS, Prophet, DeepAR+, CNN-QR, and others) or you can specify an algorithm.
Forecast - The actual predictions generated from a trained predictor.
Code Example
import boto3
import json
forecast = boto3.client('forecast', region_name='us-east-1')
# Step 1 — Create dataset group
dataset_group = forecast.create_dataset_group(
DatasetGroupName='RetailForecastGroup',
Domain='RETAIL'
)
DATASET_GROUP_ARN = dataset_group['DatasetGroupArn']
# Step 2 — Create target time series dataset schema
target_dataset = forecast.create_dataset(
DatasetName='RetailSalesTarget',
Domain='RETAIL',
DatasetType='TARGET_TIME_SERIES',
Schema={
'Attributes': [
{'AttributeName': 'item_id', 'AttributeType': 'string'},
{'AttributeName': 'timestamp', 'AttributeType': 'timestamp'},
{'AttributeName': 'demand', 'AttributeType': 'float'}
]
},
DataFrequency='D' # Daily frequency
)
# Step 3 — Import historical data from S3
# Your CSV format: item_id,timestamp,demand
# e.g.: PRODUCT-001,2023-01-01,142.0
forecast.create_dataset_import_job(
DatasetImportJobName='RetailHistoricalImport',
DatasetArn=target_dataset['DatasetArn'],
DataSource={
'S3Config': {
'Path': 's3://my-forecast-data/sales-history/',
'RoleArn': 'arn:aws:iam::123456789012:role/ForecastRole'
}
},
TimestampFormat='yyyy-MM-dd'
)
# Step 4 — Create predictor (AutoML selects best algorithm)
predictor = forecast.create_auto_predictor(
PredictorName='RetailAutoPredictor',
ForecastHorizon=30, # Forecast 30 days ahead
ForecastFrequency='D', # Daily granularity
DataConfig={
'DatasetGroupArn': DATASET_GROUP_ARN
},
OptimizationMetric='WAPE' # Weighted Absolute Percentage Error
)
PREDICTOR_ARN = predictor['PredictorArn']
# Step 5 — Create forecast
forecast_result = forecast.create_forecast(
ForecastName='RetailDemandForecast30Day',
PredictorArn=PREDICTOR_ARN
)
FORECAST_ARN = forecast_result['ForecastArn']
# Step 6 — Query predictions
forecast_query = boto3.client('forecastquery', region_name='us-east-1')
prediction = forecast_query.query_forecast(
ForecastArn=FORECAST_ARN,
Filters={'item_id': 'PRODUCT-001'}
)
# Results include P10, P50, P90 quantiles
for date, values in prediction['Forecast']['Predictions']['p50'].items():
print(f"Date: {values['Timestamp']}, Predicted Demand: {values['Value']:.0f}")
What Would Happen When You Run It
After training completes (which takes time proportional to your data volume and the number of algorithms tested), the forecast returns predictions at multiple quantiles:
- P10 - 10% probability demand will be this low. The optimistic lower bound.
- P50 - Median prediction. The most likely outcome.
- P90 - 90% probability demand will be this high. The conservative upper bound.
For inventory management, you order to the P90 to avoid stock-outs. For staffing, you schedule to the P50 and build in buffer. For capacity planning, you provision to the P90 with auto-scaling for headroom.
What Would Happen Without It
Excel trend lines do not capture seasonal patterns, promotional effects, or correlations with external data. The result is systematic over- or under-ordering. Forecast typically reduces forecasting error by 20-50% compared to traditional statistical methods, based on AWS's published benchmarks.
Limitations
Forecast requires sufficient historical data. The minimum is generally one full cycle of whatever pattern you are trying to predict for monthly patterns, at least 12 months of history. For annual seasonality, at least two full years is recommended.
Forecast is a prediction service, not a causal analysis service. It tells you what will happen, not why. It will not explain that sales will drop because a competitor is launching a rival product next month you need to bring that knowledge in as a related time series (a promotional/event calendar).
Amazon Fraud Detector - Fraud Detection at Scale
The Problem
You run a payments platform. Fraudulent transactions cost you money directly (chargebacks) and indirectly (reputational damage, regulatory scrutiny). You have transaction history. You know which past transactions were fraudulent. You need a system that evaluates new transactions in real time and flags suspicious activity before you process them.
Building this model in-house requires: labelled training data, feature engineering, model selection, training infrastructure, a real-time inference endpoint, and ongoing model maintenance as fraud patterns evolve. That is a 6-12 month ML engineering project.
What It Is
Amazon Fraud Detector is a fully managed fraud detection service. It uses your historical transaction data to train a custom fraud detection model, combines it with rules you define, and evaluates new events in real time — returning a fraud score and an outcome (approve, review, reject) in milliseconds.
Key Concepts
Event Type - The kind of event you are evaluating. online_payment, account_registration, login_attempt.
Entity - The actor in the event. Typically customer.
Variables - The features you send with each event. IP address, email address, billing amount, device fingerprint, transaction velocity.
Labels - Your historical fraud/legitimate classification for training.
Model - The trained fraud detection model. Fraud Detector uses Online Fraud Insights (OFI) and Transaction Fraud Insights (TFI) model types, trained on your data plus Amazon's aggregated fraud intelligence.
Rules - Logic-based conditions that define outcomes. "If fraud score > 900, reject. If score > 700, send to review. Otherwise, approve."
Detector - Combines the model and rules into a deployable decision engine.
Code Example
import boto3
import json
from datetime import datetime
frauddetector = boto3.client('frauddetector', region_name='us-east-1')
# Evaluate a real-time transaction
response = frauddetector.get_event_prediction(
detectorId='payment-fraud-detector',
detectorVersionId='1',
eventId='TXN-20240416-098721',
eventTypeName='online_payment',
eventTimestamp=datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ'),
entities=[
{
'entityType': 'customer',
'entityId': 'CUST-88721'
}
],
eventVariables={
'ip_address': '197.210.88.42',
'email_address': 'user@example.com',
'billing_amount': '4250.00',
'currency': 'USD',
'billing_country': 'NG',
'shipping_country': 'US', # Mismatch - potential signal
'payment_method': 'credit_card',
'card_bin': '412345',
'device_fingerprint': 'fp-abc123def456',
'transactions_last_hour': '3', # Velocity signal
'transactions_last_day': '12'
}
)
# Evaluate the response
rule_results = response['ruleResults']
model_scores = response['modelScores']
print("Model Scores:")
for score in model_scores:
print(f" {score['modelVersion']['modelId']}: {score['scores']}")
print("\nRule Outcomes:")
for rule in rule_results:
print(f" Rule '{rule['ruleId']}': outcomes = {rule['outcomes']}")
# The detector returns an outcome: APPROVE, REVIEW, or REJECT
# based on the rules configured in the detector
What Would Happen When You Run It
For a transaction where the billing country is Nigeria and the shipping address is in the United States, with 3 transactions in the last hour (velocity signal), the model may return a score of 820 out of 1000. The rule "score > 700 → REVIEW" fires. Your payment processor holds the transaction and sends it to your fraud review queue. A human reviews it within minutes and either approves or rejects.
For a routine transaction known customer, familiar device, typical amount, consistent country the score is 120. The rule "score < 400 → APPROVE" fires. The transaction processes in under 50ms.
What Would Happen Without It
Rule-based fraud detection without ML is a game of catch-up. You write rules for the fraud patterns you have already seen. Fraudsters evolve. Your rules never quite keep up. ML-based fraud detection learns statistical patterns that rules cannot capture combinations of factors that individually seem normal but together are suspicious.
Limitations
Fraud Detector requires labelled historical data. If you are a new platform without a track record of fraudulent transactions, there is no training data. In this case, AWS provides a bootstrapped model using aggregate fraud intelligence while your own data accumulates but the model improves significantly as you feed it confirmed fraud outcomes over time.
Fraud Detector also requires you to close the feedback loop. When your fraud review team determines that a flagged transaction was legitimate or confirms it was fraud, that outcome must be fed back into the model. A Fraud Detector that never receives feedback will not improve and will eventually degrade as fraud patterns evolve.
Amazon SageMaker - Build Your Own Models
The Problem With All the Previous Services
Every service in this guide so far is a pre-trained model for a specific task. Sentiment analysis. Object detection. Speech recognition. They are powerful. They are fast to implement. They work for the majority of use cases.
But they do not work for everything.
You are a hospital, and you need to predict patient readmission risk from electronic health records. No pre-trained AWS service does this. You have tabular data with hundreds of clinical features, and the model needs to be trained on your specific patient population, validated against your specific clinical outcomes, and compliant with your specific regulatory requirements.
You need to build a custom model. SageMaker is how you do that.
What It Is
Amazon SageMaker is a fully managed ML platform. It handles the infrastructure for every phase of the ML lifecycle: data preparation, model training, model evaluation, model deployment, and model monitoring.
You write the code. SageMaker runs it on managed compute, at scale, with the infrastructure complexity abstracted away.
The ML Lifecycle in SageMaker
Data Preparation - SageMaker Data Wrangler and Processing
SageMaker Processing runs data transformation jobs on managed compute. You write a Python script that transforms raw data into training-ready features. SageMaker runs it on a cluster of instances you specify, then shuts the cluster down.
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput
processor = ScriptProcessor(
image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-cpu-py310',
command=['python3'],
instance_type='ml.m5.xlarge',
instance_count=1,
role='arn:aws:iam::123456789012:role/SageMakerRole'
)
processor.run(
code='preprocessing.py',
inputs=[
ProcessingInput(
source='s3://my-bucket/raw-data/',
destination='/opt/ml/processing/input'
)
],
outputs=[
ProcessingOutput(
source='/opt/ml/processing/output',
destination='s3://my-bucket/processed-data/'
)
]
)
Model Training - SageMaker Training Jobs
A training job launches compute, runs your training script, saves the model artifact to S3, and shuts everything down. You are billed only for the time the training job runs.
import sagemaker
from sagemaker.pytorch import PyTorch
estimator = PyTorch(
entry_point='train.py',
role='arn:aws:iam::123456789012:role/SageMakerRole',
instance_type='ml.p3.2xlarge', # GPU instance for deep learning
instance_count=1,
framework_version='2.0.0',
py_version='py310',
hyperparameters={
'epochs': 50,
'learning_rate': 0.001,
'batch_size': 32
},
output_path='s3://my-bucket/model-artifacts/'
)
estimator.fit({
'train': 's3://my-bucket/processed-data/train/',
'validation': 's3://my-bucket/processed-data/validation/'
})
Hyperparameter Tuning - Automatic Model Optimization
Instead of manually trying different hyperparameter combinations, SageMaker Automatic Model Tuning runs multiple training jobs in parallel, searching for the combination that produces the best metric.
from sagemaker.tuner import HyperparameterTuner, ContinuousParameter, IntegerParameter
tuner = HyperparameterTuner(
estimator=estimator,
objective_metric_name='validation:accuracy',
objective_type='Maximize',
hyperparameter_ranges={
'learning_rate': ContinuousParameter(0.0001, 0.01),
'batch_size': IntegerParameter(16, 128),
'epochs': IntegerParameter(20, 100)
},
max_jobs=20,
max_parallel_jobs=4
)
tuner.fit({'train': 's3://my-bucket/processed-data/train/'})
Model Deployment - SageMaker Endpoints
Once trained, deploy the model to a managed HTTPS endpoint. SageMaker provisions the instance, loads the model, and serves predictions.
predictor = estimator.deploy(
initial_instance_count=1,
instance_type='ml.m5.large',
endpoint_name='patient-readmission-predictor'
)
# Real-time inference
import json
result = predictor.predict(json.dumps({
'age': 67,
'admission_type': 'emergency',
'num_procedures': 3,
'num_medications': 12,
'time_in_hospital': 5,
'number_diagnoses': 8
}))
print(f"Readmission probability: {result}")
Model Monitoring - Detecting Data Drift
After deployment, SageMaker Model Monitor continuously evaluates incoming requests against a baseline. If the statistical properties of incoming data drift significantly from the training data distribution, it raises an alert. This is how you detect that your model is receiving inputs it was not trained for before the predictions silently degrade.
SageMaker Studio
SageMaker Studio is an integrated development environment for ML a browser-based interface that provides notebooks, experiment tracking, model registry, pipelines, and monitoring in one place. Think of it as the IDE for the entire ML workflow.
What Would Happen Without SageMaker
You provision your own EC2 GPU instances. You install CUDA, frameworks, and dependencies manually. You write your own job orchestration. You manage the training compute yourself paying for idle time between experiments. You write your own model serving infrastructure. You build your own monitoring. This is months of infrastructure work before your data scientists can focus on the actual modelling.
SageMaker eliminates that infrastructure layer.
Limitations
SageMaker is powerful but it is not simple. The learning curve is real. The number of concepts estimators, processing jobs, pipelines, endpoints, model registry, feature store, experiments is large. For a team new to ML, starting with the managed pre-trained services and only reaching for SageMaker when those services are genuinely insufficient is the right approach.
SageMaker is also not cheap at scale. GPU training instances are expensive. Running a real-time endpoint 24/7 adds up. Use SageMaker Serverless Inference for endpoints with infrequent traffic, and shut down development notebooks and training clusters when not in use.
How the Services Connect - Real Architecture Patterns
The real power of these services is not any one of them in isolation. It is how they compose.
Pattern 1 - Intelligent Document Processing Pipeline
Document uploaded to S3
→ Textract extracts text, tables, key-value pairs
→ Comprehend performs entity recognition and sentiment analysis
→ Translate converts to working language if non-English
→ Results stored in DynamoDB
→ Kendra indexes documents for search
→ Human review triggered for low-confidence extractions
Use case: insurance claim processing, invoice automation, contract analysis.
Pattern 2 - Multilingual Customer Support
Customer calls support line
→ Transcribe converts speech to text (streaming, real-time)
→ Translate converts to English if non-English
→ Comprehend detects sentiment and entities in real time
→ Kendra searches internal knowledge base for relevant answers
→ Agent assist UI surfaces recommended responses
→ Polly reads suggested responses aloud to agent
→ At end of call: full transcript + sentiment + topics stored
Use case: global call centre operations with agent assist.
Pattern 3 - E-Commerce Fraud Prevention
Order placed
→ Fraud Detector evaluates transaction in real time (<100ms)
→ APPROVE: process immediately
→ REVIEW: hold order, trigger review workflow
→ Comprehend analyses customer notes for deception signals
→ Rekognition verifies ID document if requested
→ REJECT: decline and log
→ Outcomes fed back to Fraud Detector for model improvement
Use case: payment fraud prevention.
Pattern 4 - Content Moderation at Scale
User uploads image or video
→ Rekognition scans for unsafe content
→ If safe: publish immediately
→ If flagged: hold for human review
→ Transcribe audio track (if video)
→ Comprehend analyses transcript for policy violations
→ Translate transcript if non-English
→ Final publish/reject decision logged
Use case: social platforms, user-generated content moderation.
Pattern 5 - Custom ML + Managed Services
SageMaker trains custom churn prediction model on customer data
→ Model deployed to SageMaker endpoint
→ Lex chatbot engages at-risk customers
→ Polly delivers personalised audio messages
→ Forecast predicts which customers will churn next month
→ Comprehend analyses support tickets for early churn signals
→ All signals combined in a customer health score
Use case: customer retention in subscription businesses.
Choosing the Right Service
The question you should ask before reaching for any ML service is: is there a pre-trained service that does what I need?
If yes, use it. You get a production-grade model, managed infrastructure, automatic updates, and no ML expertise required.
If the pre-trained service exists but needs domain adaptation custom vocabulary in Transcribe, custom entities in Comprehend, custom labels in Rekognition use the customisation features of the managed service before reaching for SageMaker.
If no pre-trained service covers your use case, and the customisation features are insufficient, build with SageMaker.
| I need to... | Use |
|---|---|
| Extract meaning, sentiment, or entities from text | Comprehend |
| Search internal documents with natural language questions | Kendra |
| Build a chatbot or voice interface | Lex |
| Convert text to speech | Polly |
| Analyse images or video | Rekognition |
| Extract structured data from documents and forms | Textract |
| Convert speech or audio recordings to text | Transcribe |
| Translate between languages | Translate |
| Predict future values from historical time-series data | Forecast |
| Detect fraudulent events in real time | Fraud Detector |
| Build a custom ML model for a domain-specific problem | SageMaker |
This is the mental model. These services are not competing with each other they are layers of capability. Most real architectures use three or more of them in combination.
Machine learning on AWS is not about becoming an ML engineer. It is about knowing which problem maps to which API, understanding what each service needs as input and returns as output, and composing them into systems that solve real problems.
That understanding is what this guide gives you.
Written by Onyedikachi Obidiegwu | Cloud Security Engineer
Top comments (0)