Module 1.1 — AI vs ML vs Deep Learning vs LLM

The Problem With How People Learn This

Most developers hear these terms thrown around — AI, Machine Learning, Deep Learning, Neural Networks, LLM — and they either assume they all mean the same thing, or they feel too embarrassed to ask what the difference actually is.

Here's the truth: they are not the same thing. They are nested inside each other, like Russian dolls. Once you see the structure, everything else in this course will make more sense.

Let's build that picture right now.


Start With One Question

"How does a computer normally solve a problem?"

Think about how you'd write a program to check if an email is spam.

The traditional way — you'd write rules:

if email contains "free money" → mark as spam
if email contains "click here now" → mark as spam
if sender is unknown → mark as spam

This works. Until it doesn't.

What happens when spammers write "fr33 m0ney" instead? Or they start writing perfect English? Your rules break. You write more rules. They adapt. You adapt. It becomes an endless war — and you're always losing because you're always reacting.

This problem — "what if we didn't write the rules, but let the computer figure them out from examples?" — is exactly what gave birth to Machine Learning.


The Full Picture — Nested Layers

Here is the most important diagram you'll see in this course:

┌─────────────────────────────────────────────┐
│                                             │
│   ARTIFICIAL INTELLIGENCE                   │
│   (Any machine that mimics human            │
│    intelligence)                            │
│                                             │
│   ┌─────────────────────────────────────┐   │
│   │                                     │   │
│   │   MACHINE LEARNING                  │   │
│   │   (Machines that learn from data)   │   │
│   │                                     │   │
│   │   ┌─────────────────────────────┐   │   │
│   │   │                             │   │   │
│   │   │   DEEP LEARNING             │   │   │
│   │   │   (Learning using Neural    │   │   │
│   │   │    Networks)                │   │   │
│   │   │                             │   │   │
│   │   │   ┌─────────────────────┐   │   │   │
│   │   │   │                     │   │   │   │
│   │   │   │   LLM               │   │   │   │
│   │   │   │   (Deep Learning    │   │   │   │
│   │   │   │    on language)     │   │   │   │
│   │   │   │                     │   │   │   │
│   │   │   └─────────────────────┘   │   │   │
│   │   └─────────────────────────────┘   │   │
│   └─────────────────────────────────────┘   │
└─────────────────────────────────────────────┘

Every LLM is Deep Learning. Every Deep Learning system is Machine Learning. Every Machine Learning system is AI.

But the reverse is NOT true.

Not every AI is an LLM. Not every ML model uses Deep Learning. Let's go through each layer properly.


Layer 1 — Artificial Intelligence

Definition: Any technique that allows a machine to mimic human intelligence or behavior.

This is the broadest, oldest term. It started in the 1950s. Early AI was just rules — if this, then that. No learning involved. Just a very sophisticated decision tree written by humans.

Examples of AI that is NOT Machine Learning:

  • A chess engine that follows hardcoded rules
  • A GPS that calculates shortest path using an algorithm
  • A spam filter built with hand-written rules (like we discussed above)

These are all "intelligent" behaviors — but no learning is happening. A human figured out the rules and encoded them.

The limitation: You can only be as smart as the rules you write. And humans can't write rules for everything.


Layer 2 — Machine Learning

Definition: A subset of AI where the machine learns patterns from data — without being explicitly programmed with rules.

This is the shift that changed everything.

Instead of:

Developer writes rules → Computer follows them

It became:

Developer feeds data + correct answers → Computer figures out the rules itself

You give the spam filter 10,000 emails — some labeled "spam", some labeled "not spam". The algorithm studies them, finds patterns on its own, and builds its own internal rules. Rules you never wrote. Rules even you might not fully understand.

Classic ML algorithms:

  • Linear Regression
  • Decision Trees
  • Random Forest
  • Support Vector Machines (SVM)
  • K-Nearest Neighbors

The limitation of classic ML: These algorithms work well, but they struggle with complex data — images, audio, video, natural language. The features (patterns) need to be identified and extracted by humans before feeding to the model. That's expensive, slow, and often impossible at scale.


Layer 3 — Deep Learning

Definition: A subset of Machine Learning that uses Neural Networks with many layers to automatically learn features from raw data.

The word "deep" refers to the depth of the neural network — meaning many layers stacked on top of each other.

Here's the key difference:

Classic ML:

Raw Data → [Human extracts features] → ML Model → Output

Deep Learning:

Raw Data → [Neural Network extracts features automatically] → Output

You don't tell the network "look for edges in images" or "look for verb phrases in sentences." It figures that out on its own, layer by layer.

A simple way to picture a Neural Network:

Input Layer        Hidden Layers          Output Layer
                   (the "deep" part)

   [pixel]  →→→  [edge detector]
   [pixel]  →→→  [shape detector]  →→→  [cat or dog?]
   [pixel]  →→→  [pattern finder]

Each layer learns something more abstract than the previous one. The first layer might detect edges. The next detects shapes. The next detects features like eyes or ears. The final layer says "this is a cat."

Deep Learning unlocked:

  • Image recognition (Google Photos)
  • Speech recognition (Siri, Alexa)
  • Translation (Google Translate)
  • And eventually — language generation

Layer 4 — LLM (Large Language Model)

Definition: A type of Deep Learning model trained on massive amounts of text data, specifically designed to understand and generate human language.

The key word is Large.

Not just big — astronomically large:

Model

Parameters

GPT-2 (2019)

1.5 Billion

GPT-3 (2020)

175 Billion

GPT-4 (2023)

~1 Trillion (estimated)

Parameters are basically the internal numbers the model uses to make decisions. More parameters = more capacity to learn patterns = better at language.

What makes an LLM different from previous Deep Learning:

Previous deep learning models were mostly narrow — one model for images, one for speech, one for translation. Each was trained for one specific task.

LLMs are trained on everything at once — books, articles, code, websites, conversations — and they learn a general understanding of language that transfers across tasks. Same model can answer questions, write code, translate languages, and summarize documents.

How an LLM actually generates text:

This is important to understand correctly from the start.

An LLM does not look up answers in a database. It does not "know" facts the way a search engine does. What it does is:

Predict the most probable next token (word piece), given all the text before it.

That's it. That's the core operation. Everything — every answer, every essay, every piece of code — is just the model repeatedly asking: "Given everything written so far, what should come next?"

Input:  "The capital of France is"
Model:  What token comes next? → "Paris" (highest probability)

Input:  "The capital of France is Paris"
Model:  What comes next? → "." or "," or "which" ...

This is called autoregressive generation — each new token is fed back in to predict the next one.


Where Does ChatGPT Fit?

ChatGPT is not just an LLM. It's an LLM plus several layers built on top:

Base LLM (GPT-4)
      ↓
Fine-tuned on conversations
      ↓
RLHF — Reinforcement Learning from Human Feedback
(Humans rated responses, model learned to give better answers)
      ↓
Safety filters and system prompts
      ↓
ChatGPT (the product you use)

The base model just predicts tokens. ChatGPT has been trained further to be helpful, harmless, and honest — to behave like an assistant rather than just complete random text.


The Real Difference — One Line Each

Term

One Line Definition

AI

Any machine that mimics human intelligence

ML

Machines that learn patterns from data

Deep Learning

ML using multi-layered neural networks

LLM

Deep Learning model trained on massive text data

ChatGPT

An LLM fine-tuned to behave as a helpful assistant


The Mental Model to Remember

Think of it like cooking:

  • AI = the entire field of cooking (any method, any cuisine)
  • ML = learning to cook by tasting and adjusting, not following a fixed recipe
  • Deep Learning = using a professional kitchen with 50 specialized tools to cook automatically
  • LLM = that professional kitchen specifically trained on every language ever written
  • ChatGPT = the trained chef who not only can cook but knows how to serve you politely

What You Now Understand

AI  ⊃  Machine Learning  ⊃  Deep Learning  ⊃  LLM  ⊃  ChatGPT
  • AI is not magic — it started as hand-written rules
  • ML removed the need for humans to write rules
  • Deep Learning removed the need for humans to extract features
  • LLMs took Deep Learning and applied it to all of human language
  • ChatGPT is an LLM shaped into a product through fine-tuning

3-Line Summary

  1. AI is any intelligent machine behavior — ML is a subset where the machine learns from data instead of following fixed rules.
  2. Deep Learning is ML with neural networks that automatically extract patterns — no human feature engineering needed.
  3. An LLM is a massive Deep Learning model trained on text — it generates language by predicting the next most probable token, one at a time.

Module 1.1 — Complete ✅

Next up is Module 1.2 — What is Generative AI, and How ChatGPT Actually Works — we'll go deeper into what "generating" actually means, what makes AI "generative", and we'll trace a single message from your keyboard all the way through ChatGPT and back.


No comments:

Post a Comment

Module 1.2 — What is Generative AI & How ChatGPT Actually Works

Where We Left Off In Module 1.1 you learned that an LLM generates text by predicting the next most probable token — one at a time. That'...