Module 3.1 — What is an Embedding & Why Does it Exist

Start With a Real Problem

Imagine you work at Netflix.

You have 10,000 movies. A user comes in and says:

"I want to watch something like Inception"

How do you find similar movies?

Option 1 — Match by genre tag

Inception tags: ["sci-fi", "thriller", "action"]

Find movies with same tags:
→ The Matrix       ✓ (sci-fi, action)
→ Interstellar     ✓ (sci-fi, thriller)
→ Die Hard         ? (action — but totally different vibe)
→ Avengers         ? (sci-fi, action — but very different)

Genre tags are too broad. Die Hard and Inception are both "action" — but they feel nothing alike.

Option 2 — Match by keywords in description

Inception description: 
"A thief who steals corporate secrets through 
dream-sharing technology..."

Search for movies with words like "thief", "dreams", "technology"
→ Maybe finds some, maybe misses obvious similar movies

Keywords are too literal. "Interstellar" doesn't share many words with "Inception" — but they feel very similar (mind-bending, emotional, sci-fi).

The real problem:

How do you capture the feeling of a movie — not just its tags or keywords — so you can find truly similar ones?

This is exactly the problem embeddings solve. And not just for movies — for any content. Documents, sentences, questions, code, images — anything.


The Big Idea — Turn Meaning Into Numbers

What if you could represent every movie as a point in space?

Movies that feel similar → placed close together. Movies that feel different → placed far apart.

Imagine a simple 2D space:

Mind-bending ↑
             │    • Inception
             │    • Interstellar  
             │    • The Matrix
             │              • Avengers
─────────────┼──────────────────────── → Action-heavy
             │
             │  • The Notebook
             │  • Titanic
             ↓
         Emotional/Romance

Now finding "similar to Inception" is just:

Find the points closest to Inception in this space.

Interstellar and The Matrix are close → recommend them. The Notebook is far away → don't recommend it.

An embedding is exactly this — converting something (a movie, a word, a sentence, a document) into a point in space — represented as a list of numbers.


From 2D to Real Embeddings

In the example above we used 2 dimensions — easy to visualize on paper.

Real embeddings use hundreds or thousands of dimensions:

OpenAI text-embedding-3-small → 1,536 dimensions
OpenAI text-embedding-3-large → 3,072 dimensions
Google's models               → 768 dimensions
Small/fast models             → 384 dimensions

Each dimension captures some aspect of meaning. You can't visualize 1,536 dimensions — but the math works the same way as our simple 2D example.

"cat" as a 1536-dimensional embedding:
[0.23, -0.87, 0.41, 0.92, -0.13, 0.67, ...]
 dim1   dim2  dim3  dim4   dim5  dim6  ...1536 total numbers

This list of 1,536 numbers IS the meaning of "cat" — captured mathematically.


Words vs Sentences vs Documents

Embeddings work at different levels — and this is important for RAG:

Word-level embedding:

"bank" → [0.2, 0.8, 0.4, ...]
Single word represented as a vector

Sentence-level embedding:

"I went to the bank to deposit money"
→ [0.6, 0.3, 0.9, ...]
Whole sentence as one vector
Captures the full meaning of the sentence

Document-level embedding:

Entire paragraph or page
→ [0.4, 0.7, 0.2, ...]
Whole document compressed into one vector

For RAG — we mostly use sentence and paragraph level embeddings. We break documents into chunks and embed each chunk.


The Two Types of Embeddings You'll Use

Type 1 — Token Embeddings (inside the LLM)

These are the embeddings we talked about in Phase 2. They live inside the Transformer. Every token gets an embedding, and Attention updates them as they flow through layers.

You don't directly use these as a developer. They happen internally.

Type 2 — Text Embeddings (for search and RAG)

These are what YOU will use. You send text to an embedding model — it sends back a single vector representing the whole text.

// You send this text:
"What are the side effects of aspirin?"

// Embedding model sends back:
[0.23, -0.87, 0.41, 0.92, -0.13, 0.67, ... 1536 numbers]

// This vector captures the full meaning of the question

This is what we'll use in Phase 4 (Vector Databases) and Phase 5 (RAG).


Why Embeddings Are Better Than Keywords

Let's make this very concrete.

You have a document with this sentence:

"The medication reduced inflammation significantly."

A user asks:

"Does this drug help with swelling?"

Keyword search fails:

Document words: "medication", "reduced", "inflammation", "significantly"
Query words:    "drug", "help", "swelling"

Matching words: ZERO
Result: No match found ✗

Zero overlap in words. Traditional search returns nothing.

Embedding search succeeds:

"medication" ≈ "drug"          (similar meaning)
"inflammation" ≈ "swelling"    (same medical concept)
"reduced significantly" ≈ "help" (similar intent)

Embedding of document sentence: [0.8, 0.3, 0.7, ...]
Embedding of user question:     [0.7, 0.4, 0.6, ...]

These vectors are CLOSE in space → high similarity → match found ✓

This is the superpower of embeddings.

Embeddings capture meaning — not just words. So you can find similar content even when completely different words are used.

This is called semantic search — search by meaning, not keywords.


How Embedding Models Are Trained

You might wonder — how does the model learn that "medication" and "drug" should have similar embeddings?

The training process works like this:

The model reads billions of sentences. It learns — words and phrases that appear in similar contexts should have similar embeddings.

Training data examples:

"The doctor prescribed medication for pain"
"The doctor prescribed a drug for pain"

"She had inflammation in her knee"
"She had swelling in her knee"

After seeing millions of examples like this — the model learns:

"medication" and "drug" → appear in same contexts → similar embeddings
"inflammation" and "swelling" → appear in same contexts → similar embeddings
"medication" and "pizza" → almost never same context → very different embeddings

Nobody told the model these words are related. It figured it out purely from patterns in text.


Embeddings Are Static vs Contextual

This is an important distinction:

Static Embeddings (old way):

"bank" always → [0.4, 0.7, 0.2, ...]

Same vector every time — regardless of context.
"river bank" → same vector as "money bank"

Contextual Embeddings (modern way — what LLMs use):

"I walked to the river bank"
→ "bank" gets embedding influenced by "river"
→ [0.2, 0.9, 0.1, ...] (river meaning)

"I went to the bank to get money"  
→ "bank" gets embedding influenced by "money"
→ [0.8, 0.1, 0.6, ...] (financial meaning)

Different context → different embedding → different meaning captured

Modern embedding models for RAG produce contextual embeddings — the whole sentence influences the meaning of each part.


A Practical Example — What Embeddings Look Like in Code

Here's how you actually generate embeddings using the OpenAI API in JavaScript:


    const response = await fetch("https://api.openai.com/v1/embeddings", {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${process.env.OPENAI_API_KEY}`
    },
    body: JSON.stringify({
        model: "text-embedding-3-small",  // embedding model
        input: "What are the side effects of aspirin?"
    })
    });

    const data = await response.json();
    const embedding = data.data[0].embedding;

    console.log(embedding.length);     // 1536
    console.log(embedding.slice(0,5)); // [0.23, -0.87, 0.41, 0.92, -0.13]
    // ... 1531 more numbers

That array of 1,536 numbers IS the meaning of your sentence.

Store it. Compare it. Search with it.


The Three Things You Do With Embeddings

1. Store them

Convert your documents/chunks into embeddings and store them in a Vector Database. (Phase 4)

Document chunk → Embedding → Store in Vector DB

2. Compare them

Convert a user's question into an embedding and compare it against stored embeddings. (Phase 5)

User question → Embedding → Compare with stored embeddings

3. Find the closest ones

Return the document chunks whose embeddings are closest to the question embedding.

Closest embeddings → Most relevant chunks → Feed to LLM → Answer

This is the entire RAG pipeline — and it all runs on embeddings.


The Mental Model

Think of embeddings like GPS coordinates — but for meaning instead of location.

GPS coordinates:
New York  → [40.7128° N, 74.0060° W]
London    → [51.5074° N, 0.1278° W]

Close coordinates = close locations

Embeddings:
"cat"     → [0.9, 0.8, 0.1, 0.05, ...]
"dog"     → [0.8, 0.9, 0.1, 0.06, ...]
"pizza"   → [0.1, 0.2, 0.9, 0.80, ...]

Close embeddings = similar meaning

When you ask "find me something similar to this question" — you're really asking "find the GPS coordinates closest to these coordinates in 1,536-dimensional space."


3-Line Summary

  1. An embedding converts text into a list of numbers (a vector) that captures its meaning — text with similar meaning gets similar numbers, so you can find related content mathematically.
  2. Embeddings are better than keyword search because they capture meaning not words — "medication" and "drug" have similar embeddings even though they're different words.
  3. As a developer you'll use embeddings in two steps — embed your documents and store them, then embed user questions and find the closest stored embeddings — this is the core of RAG.

Module 3.1 — Complete ✅

Coming up — Module 3.2 — Vectors, Vector Space & Dimensions

Now we go one level deeper. You understand what an embedding IS — next you'll understand exactly how these lists of numbers work mathematically, what "vector space" means, and how the model finds similar embeddings. No heavy math — just clear intuition with real examples.

No comments:

Post a Comment

Phase 1 — Module 1.10: Installing Themes & Plugins — Concepts & Best Practices

Part 1 — Themes What is a Theme? A theme controls the visual appearance of your WordPress site. It determines: Layout of every page Typog...