← Back to Certifications
Domain 2 · 24% of exam

Fundamentals of Generative AI

Now you graduate from 'what is AI' to 'what is the magical chatbot stuff specifically.' This domain locks down what foundation models are, how tokens and embeddings work, the lifecycle of building one, and which AWS services serve generative AI.

Task statements: 2.1, 2.2, 2.3Estimated questions: ~12 of 50 scored

Updated May 21, 2026

The Big Picture

Imagine a child who has read every book in every library on Earth. They cannot reason like an adult, but they can predict, with eerie accuracy, what word should come next in any sentence — because they have seen so many sentences. That is, in essence, a foundation model.

The model is "trained" once, at huge cost (months of compute, billions of dollars). Then anyone can use it. AWS rents you access to this trained brain through Amazon Bedrock. You don't have to build the brain. You just send it text (a "prompt") and it sends text back. That's generative AI in one sentence.

Foundation Model

A very large pretrained model, trained on broad data using self-supervised learning, that can be adapted (via prompting, fine-tuning, RAG, etc.) to a wide range of tasks. Examples: Claude, Anthropic's models, Amazon Nova, Llama, Titan, Jurassic, Stable Diffusion.

Task 2.1 — Foundational Generative AI Concepts

2.1.1 Generative AI vocabulary you must know cold
TermPlain meaning
TokenA chunk of text the model treats as one unit. "Hello world!" might be 3 tokens. Roughly: 1 token ≈ 4 characters ≈ ¾ of a word.
TokenizationProcess of splitting input text into tokens before the model sees it.
EmbeddingA list of numbers (vector) that represents meaning. "Dog" and "puppy" produce similar vectors; "dog" and "calculator" do not.
VectorThe list of numbers itself. An embedding is a vector.
Vector database / vector storeA database optimized to find vectors similar to a query vector. Powers RAG.
Context windowHow much text (in tokens) a model can see at once. Bigger window = more context, more cost, sometimes more confusion.
TransformerThe neural-network architecture behind nearly all modern LLMs. Uses "attention" to weigh which tokens matter for each output token.
AttentionThe mechanism by which a transformer decides which earlier tokens are relevant when producing each new token.
Diffusion modelArchitecture used for image and video generation; iteratively denoises random noise into an image.
Multi-modal modelHandles multiple types of input/output (text + image, etc.).
PromptThe text you give the model as input.
Completion / ResponseThe text the model gives back.
Inference parametersSettings that control output: temperature, top-p, top-k, max tokens, stop sequences.
HallucinationModel produces confident-sounding output that is factually wrong.
GroundingConnecting model output to a verified source (your docs, a database). The opposite of hallucinating.
RAGRetrieval-Augmented Generation. Look up relevant documents, then have the model answer using those docs as context.
Agentic AIFM + tools + memory + planning loop. Can take multi-step autonomous actions.

Tokens are not words (this is the #1 confusion)

Models don't see words. They see token IDs. The word "unbelievable" might be split into "un", "believ", "able". Pricing is per token, not per word. A 1,000-word article is roughly ~1,300–1,500 tokens.
2.1.2 Foundation Model Lifecycle (your diagnostic Q20 miss)

This is the most asked ordering question in Domain 2. Memorize the sequence cold.

  1. Data selection / curation — choose massive corpora (web text, books, code).
  2. Model selection (architecture) — choose transformer family, size, etc.
  3. Pre-training — self-supervised learning on huge unlabeled data. Costs millions, takes months.
  4. Fine-tuning — adapt to specific tasks/domains using smaller labeled or curated data.
  5. Evaluation — benchmarks, human review, automated metrics (BLEU, ROUGE, perplexity, etc.).
  6. Deployment — make available via API or endpoint.
  7. Feedback / monitoring — observe usage, gather feedback (RLHF), iterate.

Mnemonic

"Data Models Pre-train, Fine-tune Evaluations, Deploy Feedback" — D-M-P-F-E-D-F. Or just remember: build it big, then make it useful, then ship it, then watch it.

Pre-training vs. fine-tuning

Pre-training is the giant initial step using self-supervised learning on unlabeled data — predicting the next word over trillions of tokens. Fine-tuning is the smaller follow-up step using curated, often labeled data to specialize the model. They are not the same. The exam loves to swap them.
2.1.3 Tokens, embeddings, vectors

How a model "reads" a prompt:

  1. You send: "What is AWS Bedrock?"
  2. The tokenizer splits it: ["What", " is", " AWS", " Bed", "rock", "?"] — 6 tokens.
  3. Each token is mapped to an embedding (a vector of, e.g., 4096 numbers).
  4. The transformer processes those vectors to predict the next token.
  5. It produces one token at a time, repeating until done.
  6. Each output token also costs money (output tokens are usually priced higher than input tokens).

Why embeddings matter

Two pieces of text with similar meaning produce similar embeddings. That's why a vector database can find "similar" documents to a question — it does math (cosine similarity, etc.) on the vectors. This is the heart of RAG.

Vector databases on AWS

ServiceWhat it isWhen to pick
Amazon OpenSearch ServiceSearch engine with vector support (k-NN)Large-scale, full-text + vector hybrid search
Aurora PostgreSQL with pgvectorRelational DB with vector extensionAlready on Postgres; small-medium vector loads
RDS for PostgreSQLSame idea, smaller scaleSimple use cases
Amazon Neptune (Analytics)Graph DB with vector searchNeed graph + vector together
Amazon DocumentDBMongoDB-compatible with vector searchMongo-style document store
Amazon MemoryDB (for Redis)In-memory vector storeLowest latency for hot vectors
Amazon Kendra (not pure vector)Managed enterprise search with semantic understandingDrop-in search across many sources

Bedrock Knowledge Bases ≠ a vector DB

Bedrock Knowledge Bases is a managed RAG service. It uses a vector store under the hood (you pick OpenSearch Serverless, Aurora PostgreSQL, Pinecone, Redis Enterprise Cloud, MongoDB Atlas, etc.). When the question asks "what's the vector store" — pick the actual store, not Knowledge Bases.
2.1.4 Token-based pricing (v1.1 emphasis)

You are charged per token, broken down by:

  • Input tokens — what you send (prompt + context + history)
  • Output tokens — what the model generates
  • Output tokens are usually 2-5x more expensive than input tokens
Pricing modelWhat it meansBest for
On-demandPay per request, per tokenSpiky, low-volume, experimentation
Provisioned throughputReserve dedicated capacity at hourly rateSteady high-volume production traffic
Batch inferenceSubmit large jobs, often discountedAsync bulk scoring
Custom model pricingHosting fee for your fine-tuned model + token costsYou've fine-tuned a model on Bedrock

Cost levers (high-yield exam content)

  • Shorter prompts = fewer input tokens.
  • Streaming = same total cost, just delivered piece-by-piece (better UX, not cheaper).
  • Smaller / cheaper models (Haiku, Nova Micro) for high-volume simple tasks.
  • Caching repeated context (Bedrock Prompt Caching) reduces cost on long system prompts.
  • Provisioned throughput is only cheaper at sustained high volume; under-utilized provisioning is more expensive than on-demand.

"Most cost-effective" decoder

  • "Spiky workload, mostly idle" → on-demand
  • "Steady high throughput, predictable" → provisioned throughput
  • "Large dataset, no urgency" → batch inference
  • "Same long system prompt across many requests" → prompt caching
2.1.5 Context engineering vs. prompt engineering (v1.1 addition)
Prompt engineeringContext engineering
WhatCrafting a single instruction to the modelDesigning the entire information environment a model sees
IncludesWording, role, examples (few-shot)System prompt + retrieved docs (RAG) + chat history + tools available + memory
ScopeOne promptThe whole stack of context
Used inAny LLM callAgentic systems, RAG pipelines, multi-turn apps

Heuristic

Prompt engineering is writing the question. Context engineering is arranging everything around the question — what facts, tools, history, and constraints the model has access to before answering.
2.1.6 Agentic AI fundamentals (v1.1 — heavily tested)

Agentic AI is the biggest v1.1 addition. Expect 3-5 questions on this topic.

What makes an agent

  1. Plan— break down the user's goal into steps
  2. Act — call tools (APIs, databases, code execution, web browsing)
  3. Observe — read the result of the tool call
  4. Iterate — decide the next step, repeat until done
  5. Memory — remember what happened earlier in the session (or across sessions)

AWS agentic AI services (memorize this table)

ServiceWhat it isWhen to pick
Amazon Bedrock AgentsManaged agents with action groups, knowledge bases, and orchestration. No-code-ish.Quick managed agent for common patterns; minimal infra
Amazon Bedrock AgentCoreA platform of services for building production agents at scale (Runtime, Memory, Observability, Identity, Code Interpreter, Browser, Gateway)Production-grade, custom agents with serious infrastructure needs
Strands Agents (SDK)Open-source SDK for building agents in code. Runs on AgentCore Runtime.Code-first developers who want flexibility
Amazon Q Developer (agent mode)Agentic developer assistant — writes/refactors code across filesDeveloper productivity, code generation

AgentCore's six services (high yield)

  • Runtime — secure, isolated execution environment for agents
  • Memory — short-term and long-term memory storage
  • Observability — traces, logs, metrics for agent runs
  • Identity — agent identity and access management for tools/APIs
  • Code Interpreter — sandboxed code execution tool
  • Browser — sandboxed web browsing tool
  • Gateway — connects agents to external APIs / MCP servers

MCP — Model Context Protocol

An open protocol for connecting AI agents to external tools and data sources. Think of it as "USB for AI agents" — a standard plug. AgentCore Gateway speaks MCP.

A2A — Agent-to-Agent

Pattern where multiple specialized agents talk to each other to solve a problem (e.g., a "planner" agent delegates to a "coder" agent and a "reviewer" agent).

Bedrock Agents vs. AgentCore

Both produce "agents" but they are different layers. Bedrock Agents = the high-level managed offering. AgentCore = the underlying platform components for building customproduction agents. If a question says "I want a quick managed agent with action groups and knowledge bases" → Bedrock Agents. "I'm building a custom agent and need scalable runtime / memory / browser tool" → AgentCore.

Task 2.2 — Capabilities and Limitations of Generative AI

2.2.1 Advantages of generative AI
  • Adaptability — one model, many tasks via prompting
  • Speed — fast responses, parallelizable
  • Personalization — tailored content per user
  • Cost reduction — automate work that previously took human hours
  • Creativity / generation— produces new content (text, images, code) that didn't exist
  • Accessibility — managed services like Bedrock require no ML expertise to use
2.2.2 Disadvantages and limitations
  • Hallucinations — confident wrong answers
  • Non-determinism — same prompt may produce different output
  • Interpretability — hard to explain why a specific answer was given
  • Bias — inherits bias from training data
  • Knowledge cutoff— model doesn't know events after training date
  • Cost — token-based pricing can balloon
  • Latency — large models are slow per request
  • Data privacy — your inputs may be logged unless you configure carefully
  • Toxicity / unsafe outputs — without guardrails, can produce harmful content

"Why can a model say something wrong with confidence?"

Because LLMs predict the most likely next token; they have no concept of truth. Their confidence is statistical, not factual. This is testable territory. The fix: RAG, grounding, fact-checking, guardrails — not "train it harder."
2.2.3 Determining model selection criteria
Selection factorWhat to weigh
Task typeText? Code? Image? Multi-modal? Pick a model built for it.
CostPer-token rate × expected volume. Use smaller models for high-volume simple tasks.
LatencySmaller / distilled models are faster.
Accuracy / qualityBigger models usually win, but not always; benchmark on your task.
Context windowHow much input does the model need to handle at once?
Customization needSome models support fine-tuning, others don't.
ComplianceRegion availability, data residency, certifications.
ModalityText only, or also images / audio / video?
2.2.4 Business value of generative AI
  • Cost savings from automation
  • Productivity gains — drafts, code, summaries faster
  • Customer experience — chatbots, personalization, faster support
  • Innovation speed — prototyping, content generation
  • Scaling expertise — knowledge assistants for niche domains

Task 2.3 — AWS Infrastructure and Technologies for Generative AI

2.3.1 AWS GenAI service catalog (memorize the discriminators)
ServiceWhat it is"If you see X, pick Y"
Amazon BedrockManaged access to foundation models via API. No infrastructure.Default for "use an LLM via API in my app"
Amazon SageMaker AIBuild, train, deploy custom ML models end-to-end"Train my own model" or "deploy custom model"
Amazon SageMaker JumpStartPrebuilt model hub inside SageMaker"One-click deploy a popular model in SageMaker"
Amazon Q BusinessEnterprise AI assistant grounded in your company data"Internal chatbot for employees with company knowledge"
Amazon Q DeveloperAI coding assistant in IDE / CLI"Help developers write code"
Amazon QuickSight Q (Quick)Natural-language BI / dashboards"Ask questions about my data in plain English"
Amazon KiroSpec-driven AI development environment (v1.1 addition)"Build AI-driven app with structured specs"
AWS TransformAgentic service for migrating/modernizing legacy workloads (mainframes, .NET, VMware)"Modernize my legacy application using AI"

Bedrock vs. SageMaker AI — the most-tested distinction

Bedrock= "I want to call a foundation model via API. I don't want to manage anything." Models from Anthropic, Amazon, Meta, AI21, Cohere, Stability, Mistral.
SageMaker AI = "I want to build, train, fine-tune, and deploy my own models — including non-FM models like XGBoost." Full ML lifecycle.
Both can host FMs. The distinction is who owns the lifecycle: Bedrock → AWS hosts it for you. SageMaker → you manage hosting.

Q Business vs. Q Developer

Both are "Amazon Q." Q Business is for end-users in companies asking questions about their company. Q Developer is for software engineers asking questions about code. Don't pick one when the other is meant.
2.3.2 Bedrock specifics (high yield)
  • Model access — you must request access to each model family in the console before using it.
  • Available models — Anthropic Claude, Amazon Nova / Titan, Meta Llama, Mistral, AI21 Jurassic, Cohere Command, Stability AI Stable Diffusion. Subject to region.
  • InvokeModel API — synchronous single call.
  • InvokeModelWithResponseStream — streaming output.
  • Converse API — unified multi-turn conversation API across model providers.
  • Bedrock Knowledge Bases — managed RAG. Connect S3 / SharePoint / Confluence / web; pick a vector store; query.
  • Bedrock Guardrails — content filters, denied topics, PII redaction, contextual grounding checks.
  • Bedrock Agents — managed agents with action groups (Lambda) and orchestration.
  • Bedrock Studio — UI for building Bedrock-powered apps.
  • Bedrock Prompt Management — version, test, and compare prompts.
  • Bedrock Model Evaluation — automatic and human evaluation jobs.
  • Custom models — fine-tune or continue pre-training select models on Bedrock.
  • Provisioned Throughput — reserve capacity for steady workloads.

Bedrock = the umbrella

When in doubt and the question says "managed foundation model service" or "no infrastructure to manage," it's Bedrock. Bedrock is the default GenAI service on AWS for the exam.
2.3.3 SageMaker AI family (high yield)
ComponentWhat it does
SageMaker StudioWeb-based IDE for ML
SageMaker JumpStartPretrained model hub (FMs and traditional ML)
SageMaker Training JobsRun custom training
SageMaker EndpointsHost trained models for inference
SageMaker Data WranglerData prep / feature engineering UI
SageMaker Ground TruthHuman-in-the-loop data labeling
SageMaker Feature StoreReusable feature library
SageMaker PipelinesCI/CD for ML
SageMaker Model RegistryVersion control for models
SageMaker Model MonitorDetect data / concept / quality / bias drift in production
SageMaker ClarifyBias detection and explainability (SHAP)
SageMaker Model CardsStandardized model documentation for governance
SageMaker CanvasNo-code ML for business analysts (note: not core v1.1 in-scope, but may appear as distractor)
2.3.4 Why AWS infrastructure matters for GenAI
  • Security & compliance baked in — IAM, KMS encryption, VPC, PrivateLink, audit trails
  • Data privacy — your data is not used to train AWS or 3rd-party models when using Bedrock
  • Regional availability — meet data residency requirements
  • Choice of models — single API, many providers, swap easily
  • Integration — connects natively to S3, Lambda, Step Functions, OpenSearch, etc.
  • Cost controls — budgets, alarms, provisioned vs. on-demand
  • Responsibility model — AWS handles infrastructure security; you handle data and access

"Why pick Bedrock over OpenAI direct?" trope

Enterprise compliance, data privacy guarantees, IAM-based access control, integration with existing AWS services, regional data residency, no model lock-in. This bullet list is the answer template for any "why AWS for GenAI" question.

Cross-cutting Service Comparison: Bedrock vs. SageMaker vs. Q vs. JumpStart

BedrockSageMaker AISageMaker JumpStartQ (Business / Developer)
Primary useCall FMs via APIBuild / train / host custom modelsDeploy pretrained models in SageMakerUse AI assistants directly
AudienceApp developersML engineersML engineersEnd users / employees / devs
EffortLow — just APIHigh — full lifecycleMedium — click-deployLowest — turnkey
CustomizationFine-tune select models, RAG, guardrailsAnything you wantUse as-is or fine-tuneLimited config
PricingPer token, or provisionedPer training hour + endpoint hoursPer endpoint hourPer user per month

Self-Quiz

Question 1

A startup wants to integrate a foundation model into its application via a managed API without provisioning servers, choosing between models from Anthropic and Meta as needs evolve. Which AWS service best fits?

  • A. Amazon SageMaker AI
  • B. Amazon Bedrock
  • C. Amazon Q Developer
  • D. Amazon Comprehend

Question 2

Place the foundation model lifecycle stages in the correct order: (1) deployment, (2) data selection, (3) fine-tuning, (4) pre-training, (5) evaluation, (6) feedback/monitoring.

  • A. 2 → 4 → 3 → 5 → 1 → 6
  • B. 4 → 2 → 5 → 3 → 1 → 6
  • C. 2 → 3 → 4 → 5 → 1 → 6
  • D. 1 → 2 → 3 → 4 → 5 → 6

Question 3

A company runs a chatbot with consistent traffic of 500 requests per second, 24/7. Which Bedrock pricing model is most cost-effective?

  • A. On-demand token pricing
  • B. Batch inference
  • C. Provisioned throughput
  • D. SageMaker serverless inference

Question 4

A team wants to build a custom agent that browses the web, executes code in a sandbox, and maintains long-term memory across user sessions. Which AWS offering should they use?

  • A. Amazon Bedrock Agents (managed)
  • B. Amazon Bedrock AgentCore
  • C. Amazon Q Business
  • D. Amazon Lex

Question 5

An LLM produces a confident but factually wrong claim about a company's product. What is this called?

  • A. Bias
  • B. Drift
  • C. Hallucination
  • D. Overfitting

Flashcards


External Resources for Domain 2