Where We Left Off
In Module 1.1 you learned that an LLM generates text by predicting the next most probable token — one at a time. That's the core mechanic.
But now you might be wondering:
- What exactly makes AI "generative"?
- What's actually happening inside ChatGPT when I send a message?
- Why does it sometimes get things wrong if it's so powerful?
- Why does it respond differently every time to the same question?
All of that — by the end of this module.
What Does "Generative" Actually Mean?
There are two broad categories of AI models:
Discriminative AI — learns to tell things apart.
Input: [image of a cat]
Task: Is this a cat or a dog?
Output: "Cat" — 94% confidence
It draws a boundary between categories. It classifies. It judges. It does NOT create anything new.
Generative AI — learns the underlying patterns of data so deeply that it can create brand new data that looks like it came from the same distribution.
Input: "Write me a poem about rain"
Task: Generate new text that fits this request
Output: A poem that has never existed before
It doesn't retrieve a stored poem. It doesn't copy-paste from its training data. It generates something new — token by token — using patterns it learned during training.
This is the core idea:
Generative AI has learned the patterns of its training data well enough to produce new, original content that fits those same patterns.
A generative image model doesn't store millions of images — it learns what makes an image look realistic, and then constructs a new one pixel by pixel.
A generative language model doesn't store billions of sentences — it learns what makes text sound coherent, and constructs new sentences token by token.
Types of Generative AI
Generative AI is not just ChatGPT. It's a family of models:
|
Model Type |
What it
Generates |
Examples |
|
LLM |
Text |
GPT-4,
Claude, Gemini |
|
Image
Generation |
Images |
DALL-E,
Midjourney, Stable Diffusion |
|
Audio
Generation |
Music, Speech |
Suno,
ElevenLabs |
|
Video
Generation |
Video |
Sora, Runway |
|
Code
Generation |
Code |
GitHub
Copilot, Cursor |
|
Multimodal |
Text + Image
+ Audio |
GPT-4o,
Gemini Ultra |
All of these are "generative" — they create new content by learning from existing content. We will focus on LLMs because that is what powers everything in this course — RAG, Agents, LangChain — all of it is built on top of LLMs.
How ChatGPT Actually Works — The Full Journey
Let's trace one message, start to finish.
You open ChatGPT and type:
"What is the tallest mountain in the world?"
Here is exactly what happens:
Step 1 — Your Message Gets Combined With a System Prompt
You only see your message. But ChatGPT doesn't receive just your message. It receives something like this:
SYSTEM:
You are ChatGPT, a helpful, harmless, and honest
AI assistant made by OpenAI. Answer questions
clearly and concisely. Do not make up information.
USER:
What is the tallest mountain in the world?
The System Prompt is a hidden set of instructions that tells the model who it is and how to behave. You never see it. It's already there before you type anything.
This is important — you'll use System Prompts heavily when building AI applications.
Step 2 — Text Gets Broken Into Tokens
The model cannot read words the way you do. It converts your text into tokens first.
Tokens are not exactly words. They are chunks of text — sometimes a full word, sometimes part of a word, sometimes punctuation.
"What is the tallest mountain in the world?"
→ ["What", " is", " the", " tall", "est", " mountain",
" in", " the", " world", "?"]
Each token gets converted to a number — because neural networks only understand numbers.
"What" → 2061
" is" → 318
" the" → 262
" tall" → 9857
"est" → 395
" mountain"→ 8598
...
We will go very deep on tokens and tokenization in Module 1.3. For now, just know this conversion happens.
Step 3 — Tokens Pass Through the Transformer
The sequence of numbers (tokens) gets fed into the Transformer — the neural network that is the heart of every LLM.
The Transformer does one thing extremely well:
It looks at ALL tokens simultaneously and figures out how each token relates to every other token.
For your sentence, it understands:
- "tallest" is directly related to "mountain"
- "world" gives global scope to the question
- The whole sentence is asking for a factual comparison
This is the Attention Mechanism — the model pays different amounts of "attention" to different tokens depending on context. We'll cover this deeply in Phase 2.
Step 4 — The Model Predicts the Next Token
After processing your input, the model now produces a probability distribution over its entire vocabulary — which is typically 50,000 to 100,000 tokens.
For every single token in its vocabulary, it assigns a probability:
Next token probabilities:
"Mount" → 31.2%
"The" → 18.7%
"Everest" → 14.3%
"Mt" → 11.1%
"K2" → 2.1%
"Mars" → 0.0001%
...
It picks the most probable one (or near-most-probable, depending on temperature — we'll cover this in Module 1.3).
Let's say it picks "Mount".
Step 5 — The Token Gets Added, Process Repeats
Now the model has:
Input + "Mount"
It runs the whole process again. New probability distribution:
"Everest" → 89.4%
"Fuji" → 3.1%
"Kilimanjaro" → 1.2%
...
Picks "Everest". Now it has:
Input + "Mount Everest"
Runs again. Picks " is". Then " the". Then " tallest". Then " mountain". And so on.
"Mount Everest is the tallest mountain in
the world, standing at 8,848 meters
(29,032 feet) above sea level."
This process is called autoregressive generation. Every token depends on all previous tokens. The model keeps generating until it produces a special "end of sequence" token that signals it's done.
The Full Flow Visualized
You type a message
↓
System Prompt + Your Message combined
↓
Text converted to Tokens
↓
Tokens converted to Numbers
↓
Numbers fed into Transformer
↓
Transformer runs Attention across all tokens
↓
Probability distribution over vocabulary
↓
Most probable token selected
↓
Token added to sequence
↓
Repeat until [END] token
↓
Numbers converted back to Text
↓
Response streams to your screen
Why Does ChatGPT Sometimes Get Things Wrong?
This is one of the most important things to understand before building AI apps.
Remember — the model is not looking anything up. It is not connected to a database of facts. It is predicting the most probable next token based on patterns it saw during training.
This means:
If the training data had wrong information → the model learned wrong patterns → it will confidently produce wrong answers.
If the training data didn't cover something → the model has no pattern to follow → it may "hallucinate" — generate plausible-sounding but completely false information.
If something happened after the training cutoff → the model simply doesn't know → it might guess or make something up.
This is called hallucination — and it's not a bug. It's a fundamental property of how these models work. The model always generates something — it doesn't know how to say "I have no pattern for this."
This is exactly why RAG (Retrieval Augmented Generation) exists — which we'll cover in Phase 5. RAG is the solution to hallucination. Instead of relying on the model's internal knowledge, you inject real, current, verified information directly into the prompt. The model then generates based on that real context instead of guessing.
Why Does It Give Different Answers Every Time?
This is where temperature comes in — and we'll go deep on this in Module 1.3.
Short version: the model doesn't always pick the single highest probability token. It picks probabilistically — meaning sometimes the 2nd or 3rd most likely token gets picked. This introduces variation.
Same question, two runs:
Run 1: "Mount Everest is the tallest mountain..."
Run 2: "The tallest mountain in the world is Mount Everest..."
Both correct. Different phrasing. Because different tokens got sampled.
Turn temperature to 0 → it always picks the highest probability token → completely deterministic, same answer every time.
Turn temperature up → more randomness → more creative, more varied, sometimes more wrong.
One More Thing — ChatGPT Has No Memory By Default
This surprises a lot of people.
Every time you send a message in a conversation, the entire conversation history is sent back to the model from the beginning. The model itself stores nothing.
Message 1: "My name is Arjun"
Message 2: "What is my name?"
What actually gets sent to the model for Message 2:
USER: My name is Arjun
ASSISTANT: Nice to meet you, Arjun!
USER: What is my name?
The model reads all of it every single time and responds. It feels like memory — but it's just the conversation being replayed in full on every request.
This has a limit — the Context Window — which is the maximum amount of text the model can read at once. When the conversation gets too long, older messages start getting dropped.
This is also why Agent Memory is a whole topic in Phase 8 — building systems that give AI actual persistent memory beyond a single conversation.
What is Generative AI — Clean Definition
Generative AI is a class of models that learn the statistical patterns of their training data well enough to produce new, original content — whether text, images, audio, or video — that follows those same patterns.
ChatGPT is a Generative AI that:
- Takes your message combined with a system prompt
- Tokenizes the input
- Runs it through a Transformer
- Generates output one token at a time based on probability
- Stops when it produces an end-of-sequence token
- Streams the result to your screen
3-Line Summary
- Generative AI doesn't retrieve stored answers — it generates new content by learning statistical patterns from training data and producing output token by token.
- ChatGPT works by combining your message with a system prompt, tokenizing it, running it through a Transformer, and repeatedly predicting the next most probable token until the response is complete.
- Hallucination happens because the model always generates something — it has no mechanism to say "I don't know" — which is exactly the problem RAG solves later in this course.
Module 1.2 — Complete ✅
Coming up: Module 1.3 — Tokens, Context Window & Temperature
This is where things get really practical. You'll understand exactly what a token is, why token count matters for cost and performance, what context window limits mean for your applications, and how temperature controls creativity vs accuracy.
No comments:
Post a Comment