Start With a Real Problem
Imagine you work at Netflix.
You have 10,000 movies. A user comes in and says:
"I want to watch something like Inception"
How do you find similar movies?
Option 1 — Match by genre tag
Inception tags: ["sci-fi", "thriller", "action"]
Find movies with same tags:
→ The Matrix ✓ (sci-fi, action)
→ Interstellar ✓ (sci-fi, thriller)
→ Die Hard ? (action — but totally different vibe)
→ Avengers ? (sci-fi, action — but very different)
Genre tags are too broad. Die Hard and Inception are both "action" — but they feel nothing alike.
Option 2 — Match by keywords in description
Inception description:
"A thief who steals corporate secrets through
dream-sharing technology..."
Search for movies with words like "thief", "dreams", "technology"
→ Maybe finds some, maybe misses obvious similar movies
Keywords are too literal. "Interstellar" doesn't share many words with "Inception" — but they feel very similar (mind-bending, emotional, sci-fi).
The real problem:
How do you capture the feeling of a movie — not just its tags or keywords — so you can find truly similar ones?
This is exactly the problem embeddings solve. And not just for movies — for any content. Documents, sentences, questions, code, images — anything.
The Big Idea — Turn Meaning Into Numbers
What if you could represent every movie as a point in space?
Movies that feel similar → placed close together. Movies that feel different → placed far apart.
Imagine a simple 2D space:
Mind-bending ↑
│ • Inception
│ • Interstellar
│ • The Matrix
│ • Avengers
─────────────┼──────────────────────── → Action-heavy
│
│ • The Notebook
│ • Titanic
↓
Emotional/Romance
Now finding "similar to Inception" is just:
Find the points closest to Inception in this space.
Interstellar and The Matrix are close → recommend them. The Notebook is far away → don't recommend it.
An embedding is exactly this — converting something (a movie, a word, a sentence, a document) into a point in space — represented as a list of numbers.
From 2D to Real Embeddings
In the example above we used 2 dimensions — easy to visualize on paper.
Real embeddings use hundreds or thousands of dimensions:
OpenAI text-embedding-3-small → 1,536 dimensions
OpenAI text-embedding-3-large → 3,072 dimensions
Google's models → 768 dimensions
Small/fast models → 384 dimensions
Each dimension captures some aspect of meaning. You can't visualize 1,536 dimensions — but the math works the same way as our simple 2D example.
"cat" as a 1536-dimensional embedding:
[0.23, -0.87, 0.41, 0.92, -0.13, 0.67, ...]
dim1 dim2 dim3 dim4 dim5 dim6 ...1536 total numbers
This list of 1,536 numbers IS the meaning of "cat" — captured mathematically.
Words vs Sentences vs Documents
Embeddings work at different levels — and this is important for RAG:
Word-level embedding:
"bank" → [0.2, 0.8, 0.4, ...]
Single word represented as a vector
Sentence-level embedding:
"I went to the bank to deposit money"
→ [0.6, 0.3, 0.9, ...]
Whole sentence as one vector
Captures the full meaning of the sentence
Document-level embedding:
Entire paragraph or page
→ [0.4, 0.7, 0.2, ...]
Whole document compressed into one vector
For RAG — we mostly use sentence and paragraph level embeddings. We break documents into chunks and embed each chunk.
The Two Types of Embeddings You'll Use
Type 1 — Token Embeddings (inside the LLM)
These are the embeddings we talked about in Phase 2. They live inside the Transformer. Every token gets an embedding, and Attention updates them as they flow through layers.
You don't directly use these as a developer. They happen internally.
Type 2 — Text Embeddings (for search and RAG)
These are what YOU will use. You send text to an embedding model — it sends back a single vector representing the whole text.
// You send this text:
"What are the side effects of aspirin?"
// Embedding model sends back:
[0.23, -0.87, 0.41, 0.92, -0.13, 0.67, ... 1536 numbers]
// This vector captures the full meaning of the question
This is what we'll use in Phase 4 (Vector Databases) and Phase 5 (RAG).
Why Embeddings Are Better Than Keywords
Let's make this very concrete.
You have a document with this sentence:
"The medication reduced inflammation significantly."
A user asks:
"Does this drug help with swelling?"
Keyword search fails:
Document words: "medication", "reduced", "inflammation", "significantly"
Query words: "drug", "help", "swelling"
Matching words: ZERO
Result: No match found ✗
Zero overlap in words. Traditional search returns nothing.
Embedding search succeeds:
"medication" ≈ "drug" (similar meaning)
"inflammation" ≈ "swelling" (same medical concept)
"reduced significantly" ≈ "help" (similar intent)
Embedding of document sentence: [0.8, 0.3, 0.7, ...]
Embedding of user question: [0.7, 0.4, 0.6, ...]
These vectors are CLOSE in space → high similarity → match found ✓
This is the superpower of embeddings.
Embeddings capture meaning — not just words. So you can find similar content even when completely different words are used.
This is called semantic search — search by meaning, not keywords.
How Embedding Models Are Trained
You might wonder — how does the model learn that "medication" and "drug" should have similar embeddings?
The training process works like this:
The model reads billions of sentences. It learns — words and phrases that appear in similar contexts should have similar embeddings.
Training data examples:
"The doctor prescribed medication for pain"
"The doctor prescribed a drug for pain"
"She had inflammation in her knee"
"She had swelling in her knee"
After seeing millions of examples like this — the model learns:
"medication" and "drug" → appear in same contexts → similar embeddings
"inflammation" and "swelling" → appear in same contexts → similar embeddings
"medication" and "pizza" → almost never same context → very different embeddings
Nobody told the model these words are related. It figured it out purely from patterns in text.
Embeddings Are Static vs Contextual
This is an important distinction:
Static Embeddings (old way):
"bank" always → [0.4, 0.7, 0.2, ...]
Same vector every time — regardless of context.
"river bank" → same vector as "money bank"
Contextual Embeddings (modern way — what LLMs use):
"I walked to the river bank"
→ "bank" gets embedding influenced by "river"
→ [0.2, 0.9, 0.1, ...] (river meaning)
"I went to the bank to get money"
→ "bank" gets embedding influenced by "money"
→ [0.8, 0.1, 0.6, ...] (financial meaning)
Different context → different embedding → different meaning captured
Modern embedding models for RAG produce contextual embeddings — the whole sentence influences the meaning of each part.
A Practical Example — What Embeddings Look Like in Code
Here's how you actually generate embeddings using the OpenAI API in JavaScript:
const response = await fetch("https://api.openai.com/v1/embeddings", { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${process.env.OPENAI_API_KEY}` }, body: JSON.stringify({ model: "text-embedding-3-small", // embedding model input: "What are the side effects of aspirin?" }) });
const data = await response.json(); const embedding = data.data[0].embedding;
console.log(embedding.length); // 1536 console.log(embedding.slice(0,5)); // [0.23, -0.87, 0.41, 0.92, -0.13] // ... 1531 more numbers
That array of 1,536 numbers IS the meaning of your sentence.
Store it. Compare it. Search with it.
The Three Things You Do With Embeddings
1. Store them
Convert your documents/chunks into embeddings and store them in a Vector Database. (Phase 4)
Document chunk → Embedding → Store in Vector DB
2. Compare them
Convert a user's question into an embedding and compare it against stored embeddings. (Phase 5)
User question → Embedding → Compare with stored embeddings
3. Find the closest ones
Return the document chunks whose embeddings are closest to the question embedding.
Closest embeddings → Most relevant chunks → Feed to LLM → Answer
This is the entire RAG pipeline — and it all runs on embeddings.
The Mental Model
Think of embeddings like GPS coordinates — but for meaning instead of location.
GPS coordinates:
New York → [40.7128° N, 74.0060° W]
London → [51.5074° N, 0.1278° W]
Close coordinates = close locations
Embeddings:
"cat" → [0.9, 0.8, 0.1, 0.05, ...]
"dog" → [0.8, 0.9, 0.1, 0.06, ...]
"pizza" → [0.1, 0.2, 0.9, 0.80, ...]
Close embeddings = similar meaning
When you ask "find me something similar to this question" — you're really asking "find the GPS coordinates closest to these coordinates in 1,536-dimensional space."
3-Line Summary
- An embedding converts text into a list of numbers (a vector) that captures its meaning — text with similar meaning gets similar numbers, so you can find related content mathematically.
- Embeddings are better than keyword search because they capture meaning not words — "medication" and "drug" have similar embeddings even though they're different words.
- As a developer you'll use embeddings in two steps — embed your documents and store them, then embed user questions and find the closest stored embeddings — this is the core of RAG.
Module 3.1 — Complete ✅
Coming up — Module 3.2 — Vectors, Vector Space & Dimensions
Now we go one level deeper. You understand what an embedding IS — next you'll understand exactly how these lists of numbers work mathematically, what "vector space" means, and how the model finds similar embeddings. No heavy math — just clear intuition with real examples.
No comments:
Post a Comment