PHASE 1 — Descriptive Statistics (Start Here)
Why: Before any ML model, you need to understand and summarize your data. Pandas .describe(), .mean(), .std() — all of this is descriptive stats.
Topics:
- Types of Data — Nominal, Ordinal, Continuous, Discrete
- Measures of Central Tendency — Mean, Median, Mode (and when to use which)
- Measures of Spread — Variance, Standard Deviation, Range, IQR
- Skewness & Kurtosis — Is your data symmetric? Heavy-tailed?
- Percentiles & Quantiles — Box plots, outlier detection
- Covariance & Correlation — How two variables move together (Pearson, Spearman)
PHASE 2 — Probability Theory
Why: ML models are probabilistic at their core. Naive Bayes, Logistic Regression, Neural Networks — all built on probability.
Topics:
- Basic Probability — Events, Sample Space, P(A), Complement
- Conditional Probability — P(A|B), independence
- Bayes' Theorem — The backbone of Naive Bayes classifier
- Random Variables — Discrete vs Continuous
- Probability Distributions:
- Bernoulli, Binomial (for classification problems)
- Normal / Gaussian (most important — appears everywhere)
- Poisson (event counting)
- Uniform, Exponential
- Expected Value & Variance of distributions
- Central Limit Theorem — Why we assume normality in many algorithms
PHASE 3 — Inferential Statistics
Why: You work with samples, not entire populations. This phase teaches you how to draw conclusions and measure confidence in your findings.
Topics:
- Population vs Sample
- Sampling Methods — Random, Stratified, etc.
- Hypothesis Testing:
- Null Hypothesis (H0) vs Alternate Hypothesis (H1)
- p-value — what it actually means (very misunderstood)
- Significance Level (alpha = 0.05)
- Type I Error (False Positive) and Type II Error (False Negative)
- Z-test and T-test — Comparing means
- Chi-Square Test — For categorical data relationships
- ANOVA — Comparing means across multiple groups
- Confidence Intervals — "I am 95% confident the true mean lies here"
PHASE 4 — Linear Algebra
Why: Every ML model operates on matrices and vectors internally. Neural networks, PCA, SVD, image data — pure linear algebra.
Topics:
- Scalars, Vectors, Matrices, Tensors — What they are and how NumPy maps to them
- Matrix Operations — Addition, Multiplication, Transpose
- Dot Product — Core operation in every neural network layer
- Identity Matrix & Inverse Matrix
- Determinant — When does a matrix have a solution?
- Eigenvalues & Eigenvectors — Critical for PCA (dimensionality reduction)
- Singular Value Decomposition (SVD) — Used in recommendation systems, NLP
- Norms (L1, L2) — Used in regularization (Ridge, Lasso regression)
- Orthogonality — Basis of PCA and feature independence
PHASE 5 — Calculus (Focused, not full course)
Why: Gradient Descent — the algorithm that trains every ML model — is pure calculus. You don't need deep calculus, but these specific concepts are non-negotiable.
Topics:
- Functions & Limits — Basic understanding
- Derivatives — Rate of change, slope of a curve
- Chain Rule — Essential for backpropagation in neural networks
- Partial Derivatives — When your function has multiple variables (it always does in ML)
- Gradient — Vector of all partial derivatives; tells you which direction to move
- Gradient Descent — How models learn by minimizing loss
- Minima & Maxima — Finding where the function is lowest (minimizing error)
- Integrals (light) — Area under curve; used in probability distributions
PHASE 6 — Information Theory (Before NLP / Advanced ML)
Why: Decision Trees, Random Forests, and all NLP models use these concepts directly.
Topics:
- Entropy — Measure of uncertainty/randomness in data
- Information Gain — How much a feature reduces uncertainty (used in Decision Trees)
- Cross-Entropy Loss — The loss function used in classification models
- KL Divergence — Difference between two probability distributions
PHASE 7 — Optimization (Before Deep Learning)
Why: Training any model = solving an optimization problem.
Topics:
- Loss Functions — MSE, MAE, Cross-Entropy — what they measure and when to use
- Convex vs Non-Convex problems
- Gradient Descent variants — Batch, Stochastic (SGD), Mini-Batch
- Learning Rate — Too high vs too low
- Momentum, Adam Optimizer — Why plain gradient descent is often not enough
- Regularization (L1/L2) — Preventing overfitting using math
Recommended Study Order
Phase 1 → Phase 2 → Phase 3 → Phase 4 → Phase 5 → Phase 6 → Phase 7
You don't need to fully complete one phase before starting the next. Once you're 70-80% comfortable, move forward and come back when a concept blocks you.
Practical Mapping (Math → Python Library)
- Descriptive Stats → pandas, numpy
- Probability & Distributions → scipy.stats
- Hypothesis Testing → scipy.stats, statsmodels
- Linear Algebra → numpy.linalg
- Calculus / Optimization → Conceptual understanding, then PyTorch/TensorFlow autograd
- Visualization of all above → matplotlib, seaborn
What You Can Skip (for now)
- Real Analysis, Topology — pure math, not needed for applied ML
- Full integral calculus — you need the concept, not manual computation
- Complex Number theory — not relevant for standard ML
Honest Time Estimate
- Phase 1-3 (Stats): 3-4 weeks at comfortable pace
- Phase 4 (Linear Algebra): 2-3 weeks
- Phase 5 (Calculus): 2 weeks (focused, not full course)
- Phase 6-7: 1-2 weeks each
Total: roughly 3-4 months if studying alongside your Python/ML work. The stats and linear algebra will immediately make your pandas and numpy usage much more intuitive.
No comments:
Post a Comment