Domain 2 · 24% of exam

Fundamentals of Generative AI

Now you graduate from 'what is AI' to 'what is the magical chatbot stuff specifically.' This domain locks down what foundation models are, how tokens and embeddings work, the lifecycle of building one, and which AWS services serve generative AI.

Task statements: 2.1, 2.2, 2.3Estimated questions: ~12 of 50 scored

Updated May 21, 2026

The Big Picture

Imagine a child who has read every book in every library on Earth. They cannot reason like an adult, but they can predict, with eerie accuracy, what word should come next in any sentence — because they have seen so many sentences. That is, in essence, a foundation model.

The model is "trained" once, at huge cost (months of compute, billions of dollars). Then anyone can use it. AWS rents you access to this trained brain through Amazon Bedrock. You don't have to build the brain. You just send it text (a "prompt") and it sends text back. That's generative AI in one sentence.

Foundation Model

A very large pretrained model, trained on broad data using self-supervised learning, that can be adapted (via prompting, fine-tuning, RAG, etc.) to a wide range of tasks. Examples: Claude, Anthropic's models, Amazon Nova, Llama, Titan, Jurassic, Stable Diffusion.

Task 2.1 — Foundational Generative AI Concepts

▶2.1.1 Generative AI vocabulary you must know cold

Term	Plain meaning
Token	A chunk of text the model treats as one unit. "Hello world!" might be 3 tokens. Roughly: 1 token ≈ 4 characters ≈ ¾ of a word.
Tokenization	Process of splitting input text into tokens before the model sees it.
Embedding	A list of numbers (vector) that represents meaning. "Dog" and "puppy" produce similar vectors; "dog" and "calculator" do not.
Vector	The list of numbers itself. An embedding is a vector.
Vector database / vector store	A database optimized to find vectors similar to a query vector. Powers RAG.
Context window	How much text (in tokens) a model can see at once. Bigger window = more context, more cost, sometimes more confusion.
Transformer	The neural-network architecture behind nearly all modern LLMs. Uses "attention" to weigh which tokens matter for each output token.
Attention	The mechanism by which a transformer decides which earlier tokens are relevant when producing each new token.
Diffusion model	Architecture used for image and video generation; iteratively denoises random noise into an image.
Multi-modal model	Handles multiple types of input/output (text + image, etc.).
Prompt	The text you give the model as input.
Completion / Response	The text the model gives back.
Inference parameters	Settings that control output: temperature, top-p, top-k, max tokens, stop sequences.
Hallucination	Model produces confident-sounding output that is factually wrong.
Grounding	Connecting model output to a verified source (your docs, a database). The opposite of hallucinating.
RAG	Retrieval-Augmented Generation. Look up relevant documents, then have the model answer using those docs as context.
Agentic AI	FM + tools + memory + planning loop. Can take multi-step autonomous actions.

Tokens are not words (this is the #1 confusion)

Models don't see words. They see token IDs. The word "unbelievable" might be split into "un", "believ", "able". Pricing is per token, not per word. A 1,000-word article is roughly ~1,300–1,500 tokens.

▶2.1.2 Foundation Model Lifecycle (your diagnostic Q20 miss)

This is the most asked ordering question in Domain 2. Memorize the sequence cold.

Data selection / curation — choose massive corpora (web text, books, code).
Model selection (architecture) — choose transformer family, size, etc.
Pre-training — self-supervised learning on huge unlabeled data. Costs millions, takes months.
Fine-tuning — adapt to specific tasks/domains using smaller labeled or curated data.
Evaluation — benchmarks, human review, automated metrics (BLEU, ROUGE, perplexity, etc.).
Deployment — make available via API or endpoint.
Feedback / monitoring — observe usage, gather feedback (RLHF), iterate.

Mnemonic

"Data Models Pre-train, Fine-tune Evaluations, Deploy Feedback" — D-M-P-F-E-D-F. Or just remember: build it big, then make it useful, then ship it, then watch it.

Pre-training vs. fine-tuning

Pre-training is the giant initial step using self-supervised learning on unlabeled data — predicting the next word over trillions of tokens. Fine-tuning is the smaller follow-up step using curated, often labeled data to specialize the model. They are not the same. The exam loves to swap them.

▶2.1.3 Tokens, embeddings, vectors

How a model "reads" a prompt:

You send: "What is AWS Bedrock?"
The tokenizer splits it: ["What", " is", " AWS", " Bed", "rock", "?"] — 6 tokens.
Each token is mapped to an embedding (a vector of, e.g., 4096 numbers).
The transformer processes those vectors to predict the next token.
It produces one token at a time, repeating until done.
Each output token also costs money (output tokens are usually priced higher than input tokens).

Why embeddings matter

Two pieces of text with similar meaning produce similar embeddings. That's why a vector database can find "similar" documents to a question — it does math (cosine similarity, etc.) on the vectors. This is the heart of RAG.

Vector databases on AWS

Service	What it is	When to pick
Amazon OpenSearch Service	Search engine with vector support (k-NN)	Large-scale, full-text + vector hybrid search
Aurora PostgreSQL with pgvector	Relational DB with vector extension	Already on Postgres; small-medium vector loads
RDS for PostgreSQL	Same idea, smaller scale	Simple use cases
Amazon Neptune (Analytics)	Graph DB with vector search	Need graph + vector together
Amazon DocumentDB	MongoDB-compatible with vector search	Mongo-style document store
Amazon MemoryDB (for Redis)	In-memory vector store	Lowest latency for hot vectors
Amazon Kendra (not pure vector)	Managed enterprise search with semantic understanding	Drop-in search across many sources

Bedrock Knowledge Bases ≠ a vector DB

Bedrock Knowledge Bases is a managed RAG service. It uses a vector store under the hood (you pick OpenSearch Serverless, Aurora PostgreSQL, Pinecone, Redis Enterprise Cloud, MongoDB Atlas, etc.). When the question asks "what's the vector store" — pick the actual store, not Knowledge Bases.

▶2.1.4 Token-based pricing (v1.1 emphasis)

You are charged per token, broken down by:

Input tokens — what you send (prompt + context + history)
Output tokens — what the model generates
Output tokens are usually 2-5x more expensive than input tokens

Pricing model	What it means	Best for
On-demand	Pay per request, per token	Spiky, low-volume, experimentation
Provisioned throughput	Reserve dedicated capacity at hourly rate	Steady high-volume production traffic
Batch inference	Submit large jobs, often discounted	Async bulk scoring
Custom model pricing	Hosting fee for your fine-tuned model + token costs	You've fine-tuned a model on Bedrock

Cost levers (high-yield exam content)

Shorter prompts = fewer input tokens.
Streaming = same total cost, just delivered piece-by-piece (better UX, not cheaper).
Smaller / cheaper models (Haiku, Nova Micro) for high-volume simple tasks.
Caching repeated context (Bedrock Prompt Caching) reduces cost on long system prompts.
Provisioned throughput is only cheaper at sustained high volume; under-utilized provisioning is more expensive than on-demand.

"Most cost-effective" decoder

"Spiky workload, mostly idle" → on-demand
"Steady high throughput, predictable" → provisioned throughput
"Large dataset, no urgency" → batch inference
"Same long system prompt across many requests" → prompt caching

▶2.1.5 Context engineering vs. prompt engineering (v1.1 addition)

	Prompt engineering	Context engineering
What	Crafting a single instruction to the model	Designing the entire information environment a model sees
Includes	Wording, role, examples (few-shot)	System prompt + retrieved docs (RAG) + chat history + tools available + memory
Scope	One prompt	The whole stack of context
Used in	Any LLM call	Agentic systems, RAG pipelines, multi-turn apps

Heuristic

Prompt engineering is writing the question. Context engineering is arranging everything around the question — what facts, tools, history, and constraints the model has access to before answering.

▶2.1.6 Agentic AI fundamentals (v1.1 — heavily tested)

Agentic AI is the biggest v1.1 addition. Expect 3-5 questions on this topic.

What makes an agent

Plan— break down the user's goal into steps
Act — call tools (APIs, databases, code execution, web browsing)
Observe — read the result of the tool call
Iterate — decide the next step, repeat until done
Memory — remember what happened earlier in the session (or across sessions)

AWS agentic AI services (memorize this table)

Service	What it is	When to pick
Amazon Bedrock Agents	Managed agents with action groups, knowledge bases, and orchestration. No-code-ish.	Quick managed agent for common patterns; minimal infra
Amazon Bedrock AgentCore	A platform of services for building production agents at scale (Runtime, Memory, Observability, Identity, Code Interpreter, Browser, Gateway)	Production-grade, custom agents with serious infrastructure needs
Strands Agents (SDK)	Open-source SDK for building agents in code. Runs on AgentCore Runtime.	Code-first developers who want flexibility
Amazon Q Developer (agent mode)	Agentic developer assistant — writes/refactors code across files	Developer productivity, code generation

AgentCore's six services (high yield)

Runtime — secure, isolated execution environment for agents
Memory — short-term and long-term memory storage
Observability — traces, logs, metrics for agent runs
Identity — agent identity and access management for tools/APIs
Code Interpreter — sandboxed code execution tool
Browser — sandboxed web browsing tool
Gateway — connects agents to external APIs / MCP servers

MCP — Model Context Protocol

An open protocol for connecting AI agents to external tools and data sources. Think of it as "USB for AI agents" — a standard plug. AgentCore Gateway speaks MCP.

A2A — Agent-to-Agent

Pattern where multiple specialized agents talk to each other to solve a problem (e.g., a "planner" agent delegates to a "coder" agent and a "reviewer" agent).

Bedrock Agents vs. AgentCore

Both produce "agents" but they are different layers. Bedrock Agents = the high-level managed offering. AgentCore = the underlying platform components for building customproduction agents. If a question says "I want a quick managed agent with action groups and knowledge bases" → Bedrock Agents. "I'm building a custom agent and need scalable runtime / memory / browser tool" → AgentCore.

Task 2.2 — Capabilities and Limitations of Generative AI

▶2.2.1 Advantages of generative AI

Adaptability — one model, many tasks via prompting
Speed — fast responses, parallelizable
Personalization — tailored content per user
Cost reduction — automate work that previously took human hours
Creativity / generation— produces new content (text, images, code) that didn't exist
Accessibility — managed services like Bedrock require no ML expertise to use

▶2.2.2 Disadvantages and limitations

Hallucinations — confident wrong answers
Non-determinism — same prompt may produce different output
Interpretability — hard to explain why a specific answer was given
Bias — inherits bias from training data
Knowledge cutoff— model doesn't know events after training date
Cost — token-based pricing can balloon
Latency — large models are slow per request
Data privacy — your inputs may be logged unless you configure carefully
Toxicity / unsafe outputs — without guardrails, can produce harmful content

"Why can a model say something wrong with confidence?"

Because LLMs predict the most likely next token; they have no concept of truth. Their confidence is statistical, not factual. This is testable territory. The fix: RAG, grounding, fact-checking, guardrails — not "train it harder."

▶2.2.3 Determining model selection criteria

Selection factor	What to weigh
Task type	Text? Code? Image? Multi-modal? Pick a model built for it.
Cost	Per-token rate × expected volume. Use smaller models for high-volume simple tasks.
Latency	Smaller / distilled models are faster.
Accuracy / quality	Bigger models usually win, but not always; benchmark on your task.
Context window	How much input does the model need to handle at once?
Customization need	Some models support fine-tuning, others don't.
Compliance	Region availability, data residency, certifications.
Modality	Text only, or also images / audio / video?

▶2.2.4 Business value of generative AI

Cost savings from automation
Productivity gains — drafts, code, summaries faster
Customer experience — chatbots, personalization, faster support
Innovation speed — prototyping, content generation
Scaling expertise — knowledge assistants for niche domains

Task 2.3 — AWS Infrastructure and Technologies for Generative AI

▶2.3.1 AWS GenAI service catalog (memorize the discriminators)

Service	What it is	"If you see X, pick Y"
Amazon Bedrock	Managed access to foundation models via API. No infrastructure.	Default for "use an LLM via API in my app"
Amazon SageMaker AI	Build, train, deploy custom ML models end-to-end	"Train my own model" or "deploy custom model"
Amazon SageMaker JumpStart	Prebuilt model hub inside SageMaker	"One-click deploy a popular model in SageMaker"
Amazon Q Business	Enterprise AI assistant grounded in your company data	"Internal chatbot for employees with company knowledge"
Amazon Q Developer	AI coding assistant in IDE / CLI	"Help developers write code"
Amazon QuickSight Q (Quick)	Natural-language BI / dashboards	"Ask questions about my data in plain English"
Amazon Kiro	Spec-driven AI development environment (v1.1 addition)	"Build AI-driven app with structured specs"
AWS Transform	Agentic service for migrating/modernizing legacy workloads (mainframes, .NET, VMware)	"Modernize my legacy application using AI"

Bedrock vs. SageMaker AI — the most-tested distinction

Bedrock= "I want to call a foundation model via API. I don't want to manage anything." Models from Anthropic, Amazon, Meta, AI21, Cohere, Stability, Mistral.
SageMaker AI = "I want to build, train, fine-tune, and deploy my own models — including non-FM models like XGBoost." Full ML lifecycle.
Both can host FMs. The distinction is who owns the lifecycle: Bedrock → AWS hosts it for you. SageMaker → you manage hosting.

Q Business vs. Q Developer

Both are "Amazon Q." Q Business is for end-users in companies asking questions about their company. Q Developer is for software engineers asking questions about code. Don't pick one when the other is meant.

▶2.3.2 Bedrock specifics (high yield)

Model access — you must request access to each model family in the console before using it.
Available models — Anthropic Claude, Amazon Nova / Titan, Meta Llama, Mistral, AI21 Jurassic, Cohere Command, Stability AI Stable Diffusion. Subject to region.
InvokeModel API — synchronous single call.
InvokeModelWithResponseStream — streaming output.
Converse API — unified multi-turn conversation API across model providers.
Bedrock Knowledge Bases — managed RAG. Connect S3 / SharePoint / Confluence / web; pick a vector store; query.
Bedrock Guardrails — content filters, denied topics, PII redaction, contextual grounding checks.
Bedrock Agents — managed agents with action groups (Lambda) and orchestration.
Bedrock Studio — UI for building Bedrock-powered apps.
Bedrock Prompt Management — version, test, and compare prompts.
Bedrock Model Evaluation — automatic and human evaluation jobs.
Custom models — fine-tune or continue pre-training select models on Bedrock.
Provisioned Throughput — reserve capacity for steady workloads.

Bedrock = the umbrella

When in doubt and the question says "managed foundation model service" or "no infrastructure to manage," it's Bedrock. Bedrock is the default GenAI service on AWS for the exam.

▶2.3.3 SageMaker AI family (high yield)

Component	What it does
SageMaker Studio	Web-based IDE for ML
SageMaker JumpStart	Pretrained model hub (FMs and traditional ML)
SageMaker Training Jobs	Run custom training
SageMaker Endpoints	Host trained models for inference
SageMaker Data Wrangler	Data prep / feature engineering UI
SageMaker Ground Truth	Human-in-the-loop data labeling
SageMaker Feature Store	Reusable feature library
SageMaker Pipelines	CI/CD for ML
SageMaker Model Registry	Version control for models
SageMaker Model Monitor	Detect data / concept / quality / bias drift in production
SageMaker Clarify	Bias detection and explainability (SHAP)
SageMaker Model Cards	Standardized model documentation for governance
SageMaker Canvas	No-code ML for business analysts (note: not core v1.1 in-scope, but may appear as distractor)

▶2.3.4 Why AWS infrastructure matters for GenAI

Security & compliance baked in — IAM, KMS encryption, VPC, PrivateLink, audit trails
Data privacy — your data is not used to train AWS or 3rd-party models when using Bedrock
Regional availability — meet data residency requirements
Choice of models — single API, many providers, swap easily
Integration — connects natively to S3, Lambda, Step Functions, OpenSearch, etc.
Cost controls — budgets, alarms, provisioned vs. on-demand
Responsibility model — AWS handles infrastructure security; you handle data and access

"Why pick Bedrock over OpenAI direct?" trope

Enterprise compliance, data privacy guarantees, IAM-based access control, integration with existing AWS services, regional data residency, no model lock-in. This bullet list is the answer template for any "why AWS for GenAI" question.

Cross-cutting Service Comparison: Bedrock vs. SageMaker vs. Q vs. JumpStart

	Bedrock	SageMaker AI	SageMaker JumpStart	Q (Business / Developer)
Primary use	Call FMs via API	Build / train / host custom models	Deploy pretrained models in SageMaker	Use AI assistants directly
Audience	App developers	ML engineers	ML engineers	End users / employees / devs
Effort	Low — just API	High — full lifecycle	Medium — click-deploy	Lowest — turnkey
Customization	Fine-tune select models, RAG, guardrails	Anything you want	Use as-is or fine-tune	Limited config
Pricing	Per token, or provisioned	Per training hour + endpoint hours	Per endpoint hour	Per user per month

Self-Quiz

Question 1

A startup wants to integrate a foundation model into its application via a managed API without provisioning servers, choosing between models from Anthropic and Meta as needs evolve. Which AWS service best fits?

A. Amazon SageMaker AI
B. Amazon Bedrock
C. Amazon Q Developer
D. Amazon Comprehend

Question 2

Place the foundation model lifecycle stages in the correct order: (1) deployment, (2) data selection, (3) fine-tuning, (4) pre-training, (5) evaluation, (6) feedback/monitoring.

A. 2 → 4 → 3 → 5 → 1 → 6
B. 4 → 2 → 5 → 3 → 1 → 6
C. 2 → 3 → 4 → 5 → 1 → 6
D. 1 → 2 → 3 → 4 → 5 → 6

Question 3

A company runs a chatbot with consistent traffic of 500 requests per second, 24/7. Which Bedrock pricing model is most cost-effective?

A. On-demand token pricing
B. Batch inference
C. Provisioned throughput
D. SageMaker serverless inference

Question 4

A team wants to build a custom agent that browses the web, executes code in a sandbox, and maintains long-term memory across user sessions. Which AWS offering should they use?

A. Amazon Bedrock Agents (managed)
B. Amazon Bedrock AgentCore
C. Amazon Q Business
D. Amazon Lex

Question 5

An LLM produces a confident but factually wrong claim about a company's product. What is this called?

A. Bias
B. Drift
C. Hallucination
D. Overfitting

Flashcards

External Resources for Domain 2

← Domain 1 — Fundamentals of AI and ML Home Domain 3 — Applications of Foundation Models→