Fundamentals of Generative AI
Now you graduate from 'what is AI' to 'what is the magical chatbot stuff specifically.' This domain locks down what foundation models are, how tokens and embeddings work, the lifecycle of building one, and which AWS services serve generative AI.
Updated May 21, 2026
The Big Picture
Imagine a child who has read every book in every library on Earth. They cannot reason like an adult, but they can predict, with eerie accuracy, what word should come next in any sentence — because they have seen so many sentences. That is, in essence, a foundation model.
The model is "trained" once, at huge cost (months of compute, billions of dollars). Then anyone can use it. AWS rents you access to this trained brain through Amazon Bedrock. You don't have to build the brain. You just send it text (a "prompt") and it sends text back. That's generative AI in one sentence.
Foundation Model
Task 2.1 — Foundational Generative AI Concepts
▶2.1.1 Generative AI vocabulary you must know cold
| Term | Plain meaning |
|---|---|
| Token | A chunk of text the model treats as one unit. "Hello world!" might be 3 tokens. Roughly: 1 token ≈ 4 characters ≈ ¾ of a word. |
| Tokenization | Process of splitting input text into tokens before the model sees it. |
| Embedding | A list of numbers (vector) that represents meaning. "Dog" and "puppy" produce similar vectors; "dog" and "calculator" do not. |
| Vector | The list of numbers itself. An embedding is a vector. |
| Vector database / vector store | A database optimized to find vectors similar to a query vector. Powers RAG. |
| Context window | How much text (in tokens) a model can see at once. Bigger window = more context, more cost, sometimes more confusion. |
| Transformer | The neural-network architecture behind nearly all modern LLMs. Uses "attention" to weigh which tokens matter for each output token. |
| Attention | The mechanism by which a transformer decides which earlier tokens are relevant when producing each new token. |
| Diffusion model | Architecture used for image and video generation; iteratively denoises random noise into an image. |
| Multi-modal model | Handles multiple types of input/output (text + image, etc.). |
| Prompt | The text you give the model as input. |
| Completion / Response | The text the model gives back. |
| Inference parameters | Settings that control output: temperature, top-p, top-k, max tokens, stop sequences. |
| Hallucination | Model produces confident-sounding output that is factually wrong. |
| Grounding | Connecting model output to a verified source (your docs, a database). The opposite of hallucinating. |
| RAG | Retrieval-Augmented Generation. Look up relevant documents, then have the model answer using those docs as context. |
| Agentic AI | FM + tools + memory + planning loop. Can take multi-step autonomous actions. |
Tokens are not words (this is the #1 confusion)
▶2.1.2 Foundation Model Lifecycle (your diagnostic Q20 miss)
This is the most asked ordering question in Domain 2. Memorize the sequence cold.
- Data selection / curation — choose massive corpora (web text, books, code).
- Model selection (architecture) — choose transformer family, size, etc.
- Pre-training — self-supervised learning on huge unlabeled data. Costs millions, takes months.
- Fine-tuning — adapt to specific tasks/domains using smaller labeled or curated data.
- Evaluation — benchmarks, human review, automated metrics (BLEU, ROUGE, perplexity, etc.).
- Deployment — make available via API or endpoint.
- Feedback / monitoring — observe usage, gather feedback (RLHF), iterate.
Mnemonic
Pre-training vs. fine-tuning
▶2.1.3 Tokens, embeddings, vectors
How a model "reads" a prompt:
- You send:
"What is AWS Bedrock?" - The tokenizer splits it:
["What", " is", " AWS", " Bed", "rock", "?"]— 6 tokens. - Each token is mapped to an embedding (a vector of, e.g., 4096 numbers).
- The transformer processes those vectors to predict the next token.
- It produces one token at a time, repeating until done.
- Each output token also costs money (output tokens are usually priced higher than input tokens).
Why embeddings matter
Vector databases on AWS
| Service | What it is | When to pick |
|---|---|---|
| Amazon OpenSearch Service | Search engine with vector support (k-NN) | Large-scale, full-text + vector hybrid search |
| Aurora PostgreSQL with pgvector | Relational DB with vector extension | Already on Postgres; small-medium vector loads |
| RDS for PostgreSQL | Same idea, smaller scale | Simple use cases |
| Amazon Neptune (Analytics) | Graph DB with vector search | Need graph + vector together |
| Amazon DocumentDB | MongoDB-compatible with vector search | Mongo-style document store |
| Amazon MemoryDB (for Redis) | In-memory vector store | Lowest latency for hot vectors |
| Amazon Kendra (not pure vector) | Managed enterprise search with semantic understanding | Drop-in search across many sources |
Bedrock Knowledge Bases ≠ a vector DB
▶2.1.4 Token-based pricing (v1.1 emphasis)
You are charged per token, broken down by:
- Input tokens — what you send (prompt + context + history)
- Output tokens — what the model generates
- Output tokens are usually 2-5x more expensive than input tokens
| Pricing model | What it means | Best for |
|---|---|---|
| On-demand | Pay per request, per token | Spiky, low-volume, experimentation |
| Provisioned throughput | Reserve dedicated capacity at hourly rate | Steady high-volume production traffic |
| Batch inference | Submit large jobs, often discounted | Async bulk scoring |
| Custom model pricing | Hosting fee for your fine-tuned model + token costs | You've fine-tuned a model on Bedrock |
Cost levers (high-yield exam content)
- Shorter prompts = fewer input tokens.
- Streaming = same total cost, just delivered piece-by-piece (better UX, not cheaper).
- Smaller / cheaper models (Haiku, Nova Micro) for high-volume simple tasks.
- Caching repeated context (Bedrock Prompt Caching) reduces cost on long system prompts.
- Provisioned throughput is only cheaper at sustained high volume; under-utilized provisioning is more expensive than on-demand.
"Most cost-effective" decoder
- "Spiky workload, mostly idle" → on-demand
- "Steady high throughput, predictable" → provisioned throughput
- "Large dataset, no urgency" → batch inference
- "Same long system prompt across many requests" → prompt caching
▶2.1.5 Context engineering vs. prompt engineering (v1.1 addition)
| Prompt engineering | Context engineering | |
|---|---|---|
| What | Crafting a single instruction to the model | Designing the entire information environment a model sees |
| Includes | Wording, role, examples (few-shot) | System prompt + retrieved docs (RAG) + chat history + tools available + memory |
| Scope | One prompt | The whole stack of context |
| Used in | Any LLM call | Agentic systems, RAG pipelines, multi-turn apps |
Heuristic
▶2.1.6 Agentic AI fundamentals (v1.1 — heavily tested)
Agentic AI is the biggest v1.1 addition. Expect 3-5 questions on this topic.
What makes an agent
- Plan— break down the user's goal into steps
- Act — call tools (APIs, databases, code execution, web browsing)
- Observe — read the result of the tool call
- Iterate — decide the next step, repeat until done
- Memory — remember what happened earlier in the session (or across sessions)
AWS agentic AI services (memorize this table)
| Service | What it is | When to pick |
|---|---|---|
| Amazon Bedrock Agents | Managed agents with action groups, knowledge bases, and orchestration. No-code-ish. | Quick managed agent for common patterns; minimal infra |
| Amazon Bedrock AgentCore | A platform of services for building production agents at scale (Runtime, Memory, Observability, Identity, Code Interpreter, Browser, Gateway) | Production-grade, custom agents with serious infrastructure needs |
| Strands Agents (SDK) | Open-source SDK for building agents in code. Runs on AgentCore Runtime. | Code-first developers who want flexibility |
| Amazon Q Developer (agent mode) | Agentic developer assistant — writes/refactors code across files | Developer productivity, code generation |
AgentCore's six services (high yield)
- Runtime — secure, isolated execution environment for agents
- Memory — short-term and long-term memory storage
- Observability — traces, logs, metrics for agent runs
- Identity — agent identity and access management for tools/APIs
- Code Interpreter — sandboxed code execution tool
- Browser — sandboxed web browsing tool
- Gateway — connects agents to external APIs / MCP servers
MCP — Model Context Protocol
A2A — Agent-to-Agent
Bedrock Agents vs. AgentCore
Task 2.2 — Capabilities and Limitations of Generative AI
▶2.2.1 Advantages of generative AI
- Adaptability — one model, many tasks via prompting
- Speed — fast responses, parallelizable
- Personalization — tailored content per user
- Cost reduction — automate work that previously took human hours
- Creativity / generation— produces new content (text, images, code) that didn't exist
- Accessibility — managed services like Bedrock require no ML expertise to use
▶2.2.2 Disadvantages and limitations
- Hallucinations — confident wrong answers
- Non-determinism — same prompt may produce different output
- Interpretability — hard to explain why a specific answer was given
- Bias — inherits bias from training data
- Knowledge cutoff— model doesn't know events after training date
- Cost — token-based pricing can balloon
- Latency — large models are slow per request
- Data privacy — your inputs may be logged unless you configure carefully
- Toxicity / unsafe outputs — without guardrails, can produce harmful content
"Why can a model say something wrong with confidence?"
▶2.2.3 Determining model selection criteria
| Selection factor | What to weigh |
|---|---|
| Task type | Text? Code? Image? Multi-modal? Pick a model built for it. |
| Cost | Per-token rate × expected volume. Use smaller models for high-volume simple tasks. |
| Latency | Smaller / distilled models are faster. |
| Accuracy / quality | Bigger models usually win, but not always; benchmark on your task. |
| Context window | How much input does the model need to handle at once? |
| Customization need | Some models support fine-tuning, others don't. |
| Compliance | Region availability, data residency, certifications. |
| Modality | Text only, or also images / audio / video? |
▶2.2.4 Business value of generative AI
- Cost savings from automation
- Productivity gains — drafts, code, summaries faster
- Customer experience — chatbots, personalization, faster support
- Innovation speed — prototyping, content generation
- Scaling expertise — knowledge assistants for niche domains
Task 2.3 — AWS Infrastructure and Technologies for Generative AI
▶2.3.1 AWS GenAI service catalog (memorize the discriminators)
| Service | What it is | "If you see X, pick Y" |
|---|---|---|
| Amazon Bedrock | Managed access to foundation models via API. No infrastructure. | Default for "use an LLM via API in my app" |
| Amazon SageMaker AI | Build, train, deploy custom ML models end-to-end | "Train my own model" or "deploy custom model" |
| Amazon SageMaker JumpStart | Prebuilt model hub inside SageMaker | "One-click deploy a popular model in SageMaker" |
| Amazon Q Business | Enterprise AI assistant grounded in your company data | "Internal chatbot for employees with company knowledge" |
| Amazon Q Developer | AI coding assistant in IDE / CLI | "Help developers write code" |
| Amazon QuickSight Q (Quick) | Natural-language BI / dashboards | "Ask questions about my data in plain English" |
| Amazon Kiro | Spec-driven AI development environment (v1.1 addition) | "Build AI-driven app with structured specs" |
| AWS Transform | Agentic service for migrating/modernizing legacy workloads (mainframes, .NET, VMware) | "Modernize my legacy application using AI" |
Bedrock vs. SageMaker AI — the most-tested distinction
SageMaker AI = "I want to build, train, fine-tune, and deploy my own models — including non-FM models like XGBoost." Full ML lifecycle.
Both can host FMs. The distinction is who owns the lifecycle: Bedrock → AWS hosts it for you. SageMaker → you manage hosting.
Q Business vs. Q Developer
▶2.3.2 Bedrock specifics (high yield)
- Model access — you must request access to each model family in the console before using it.
- Available models — Anthropic Claude, Amazon Nova / Titan, Meta Llama, Mistral, AI21 Jurassic, Cohere Command, Stability AI Stable Diffusion. Subject to region.
- InvokeModel API — synchronous single call.
- InvokeModelWithResponseStream — streaming output.
- Converse API — unified multi-turn conversation API across model providers.
- Bedrock Knowledge Bases — managed RAG. Connect S3 / SharePoint / Confluence / web; pick a vector store; query.
- Bedrock Guardrails — content filters, denied topics, PII redaction, contextual grounding checks.
- Bedrock Agents — managed agents with action groups (Lambda) and orchestration.
- Bedrock Studio — UI for building Bedrock-powered apps.
- Bedrock Prompt Management — version, test, and compare prompts.
- Bedrock Model Evaluation — automatic and human evaluation jobs.
- Custom models — fine-tune or continue pre-training select models on Bedrock.
- Provisioned Throughput — reserve capacity for steady workloads.
Bedrock = the umbrella
▶2.3.3 SageMaker AI family (high yield)
| Component | What it does |
|---|---|
| SageMaker Studio | Web-based IDE for ML |
| SageMaker JumpStart | Pretrained model hub (FMs and traditional ML) |
| SageMaker Training Jobs | Run custom training |
| SageMaker Endpoints | Host trained models for inference |
| SageMaker Data Wrangler | Data prep / feature engineering UI |
| SageMaker Ground Truth | Human-in-the-loop data labeling |
| SageMaker Feature Store | Reusable feature library |
| SageMaker Pipelines | CI/CD for ML |
| SageMaker Model Registry | Version control for models |
| SageMaker Model Monitor | Detect data / concept / quality / bias drift in production |
| SageMaker Clarify | Bias detection and explainability (SHAP) |
| SageMaker Model Cards | Standardized model documentation for governance |
| SageMaker Canvas | No-code ML for business analysts (note: not core v1.1 in-scope, but may appear as distractor) |
▶2.3.4 Why AWS infrastructure matters for GenAI
- Security & compliance baked in — IAM, KMS encryption, VPC, PrivateLink, audit trails
- Data privacy — your data is not used to train AWS or 3rd-party models when using Bedrock
- Regional availability — meet data residency requirements
- Choice of models — single API, many providers, swap easily
- Integration — connects natively to S3, Lambda, Step Functions, OpenSearch, etc.
- Cost controls — budgets, alarms, provisioned vs. on-demand
- Responsibility model — AWS handles infrastructure security; you handle data and access
"Why pick Bedrock over OpenAI direct?" trope
Cross-cutting Service Comparison: Bedrock vs. SageMaker vs. Q vs. JumpStart
| Bedrock | SageMaker AI | SageMaker JumpStart | Q (Business / Developer) | |
|---|---|---|---|---|
| Primary use | Call FMs via API | Build / train / host custom models | Deploy pretrained models in SageMaker | Use AI assistants directly |
| Audience | App developers | ML engineers | ML engineers | End users / employees / devs |
| Effort | Low — just API | High — full lifecycle | Medium — click-deploy | Lowest — turnkey |
| Customization | Fine-tune select models, RAG, guardrails | Anything you want | Use as-is or fine-tune | Limited config |
| Pricing | Per token, or provisioned | Per training hour + endpoint hours | Per endpoint hour | Per user per month |
Self-Quiz
Question 1
A startup wants to integrate a foundation model into its application via a managed API without provisioning servers, choosing between models from Anthropic and Meta as needs evolve. Which AWS service best fits?
- A. Amazon SageMaker AI
- B. Amazon Bedrock
- C. Amazon Q Developer
- D. Amazon Comprehend
Question 2
Place the foundation model lifecycle stages in the correct order: (1) deployment, (2) data selection, (3) fine-tuning, (4) pre-training, (5) evaluation, (6) feedback/monitoring.
- A. 2 → 4 → 3 → 5 → 1 → 6
- B. 4 → 2 → 5 → 3 → 1 → 6
- C. 2 → 3 → 4 → 5 → 1 → 6
- D. 1 → 2 → 3 → 4 → 5 → 6
Question 3
A company runs a chatbot with consistent traffic of 500 requests per second, 24/7. Which Bedrock pricing model is most cost-effective?
- A. On-demand token pricing
- B. Batch inference
- C. Provisioned throughput
- D. SageMaker serverless inference
Question 4
A team wants to build a custom agent that browses the web, executes code in a sandbox, and maintains long-term memory across user sessions. Which AWS offering should they use?
- A. Amazon Bedrock Agents (managed)
- B. Amazon Bedrock AgentCore
- C. Amazon Q Business
- D. Amazon Lex
Question 5
An LLM produces a confident but factually wrong claim about a company's product. What is this called?
- A. Bias
- B. Drift
- C. Hallucination
- D. Overfitting