What is Mean?
We calculate the mean (average) because it gives a single value that represents the whole dataset.
Why we need mean (benefits):
- Easy understanding: Instead of looking at many numbers, one value summarizes everything.
- Quick comparison: You can easily compare different groups (e.g., average salary of two companies).
- Decision making: Helps in making decisions based on overall performance (e.g., average marks, sales).
- Finding trends: Shows general behavior of data (high, low, normal).
- Used in formulas: Mean is the base for many calculations like variance, standard deviation, etc.
Mean is the average of a set of numbers. You add all values together, then divide by how many values there are.
Simple idea: If 5 friends scored 60, 70, 80, 90, 100 in a test — what was the "typical" score? You find the mean.
Formula:
Mean (μ or x̄) = Sum of all values / Total count
= (x₁ + x₂ + x₃ + ... + xₙ) / n
Example:
Values: 60, 70, 80, 90, 100
Sum = 60 + 70 + 80 + 90 + 100 = 400
Count = 5
Mean = 400 / 5 = 80
Why Do We Use Mean?
Because we need one number that represents the whole dataset.
In ML, you can't feed 10,000 raw values into every formula. You need summaries. Mean is the most fundamental summary of data.
It answers: "If everything was equal, what would each value be?"
Types of Mean (All Used in ML)
1. Arithmetic Mean
This is the standard mean everyone knows. Add everything, divide by count.
import numpy as np
scores = [60, 70, 80, 90, 100] mean = np.mean(scores) print(mean) # 80.0
Used in: Loss functions, accuracy calculation, gradient descent, feature scaling.
2. Weighted Mean
Weighted mean is used when all values are not equally important.
👉 In normal mean, every value has same importance
👉 In weighted mean, some values have more importance (weight) than others
Some values matter MORE than others. You assign a weight to each value.
Formula:
Weighted Mean = (w₁x₁ + w₂x₂ + ... + wₙxₙ) / (w₁ + w₂ + ... + wₙ)
Example 1: You have 3 exams. Final exam is worth more.
import numpy as np scores = [70, 80, 90] weights = [1, 1, 3] # Final exam has weight 3
weighted_mean = np.average(scores, weights=weights) print(weighted_mean) # 84.0
# Manual: (70*1 + 80*1 + 90*3) / (1+1+3) = 420/5 = 84
Example 2:
Marks:
Math = 90 (weight = 50%)English = 80 (weight = 30%)Science = 70 (weight = 20%)
Now we don’t treat all subjects equally.
Weighted Mean =
(90 × 0.5) + (80 × 0.3) + (70 × 0.2)
= 45 + 24 + 14
= 83
Used in: Ensemble models (XGBoost, Random Forest voting), class imbalance handling, recommendation systems.
3. Geometric Mean
🤔 First Understand the Problem — Why Arithmetic Mean Fails?
Suppose you have ₹100. You invest it:
|
Year |
Return |
Your Money |
|
Year 1 |
+100% |
₹100 → ₹200 |
|
Year 2 |
-50% |
₹200 → ₹100 |
Arithmetic Mean says:
Reality check: Your money went ₹100 → ₹200 → ₹100 back. Real return = 0% 😐
So arithmetic mean showed 25% profit when actually there was 0% profit. This exact problem is solved by Geometric Mean.
🧠 Core Idea — Growth That Multiplies
Whenever one value grows on top of the previous value (compounding), use Geometric Mean.
|
Type |
Operation |
Use When |
|
Arithmetic Mean |
Adds numbers |
Values are independent |
|
Geometric Mean |
Multiplies numbers |
Each value depends on previous one |
📐 Formula
Two steps only:
- Multiply all numbers together
- Take the nth root (n = how many numbers you have)
💰 Example 1 — Investment Returns
You have ₹1000. Returns over 3 years:
- Year 1: +10%
- Year 2: -20%
- Year 3: +30%
🔄 Step 0 — Convert % to Multiplier (Most Important Step)
Why do we convert? Because we need to multiply, not add. A multiplier tells us what to multiply the current amount by.
|
Year |
Return |
How to Convert |
Multiplier |
|
Year 1 |
+10% |
1.00 + 0.10 |
1.10 |
|
Year 2 |
-20% |
1.00 − 0.20 |
0.80 |
|
Year 3 |
+30% |
1.00 + 0.30 |
1.30 |
Rule: Always write it as
1 + (percent/100)+10% → 1 + (10/100) = 1 + 0.10 = 1.10 -20% → 1 + (-20/100) = 1 − 0.20 = 0.80
📊 Step 1 — Multiply All Multipliers
Calculate left to right:
🌱 Step 2 — Take the nth Root
Here n = 3 (three years), so we take the cube root:
means "what number multiplied by itself 3 times gives 1.144?"
🎯 Step 3 — Convert Back to Percentage
✅ Step 4 — Verify the Answer (Proof)
Actual path of money:
Using GM (4.56% every year):
Both give the same final amount — so GM is correct!
What Arithmetic Mean would have given (wrong):
👨👩👧 Example 2 — Population Growth
City population = 10,00,000. Growth over 3 years:
- Year 1: +5%
- Year 2: +8%
- Year 3: +6%
🔄 Step 0 — Convert to Multipliers
|
Year |
Growth |
Conversion |
Multiplier |
|
Year 1 |
+5% |
1 + 0.05 |
1.05 |
|
Year 2 |
+8% |
1 + 0.08 |
1.08 |
|
Year 3 |
+6% |
1 + 0.06 |
1.06 |
📊 Step 1 — Multiply All Multipliers
1.134 × 1.06= 1.20204
Product=1.20204
🌱 Step 2 — Take the Cube Root (n = 3)
🎯 Step 3 — Convert to Percentage
✅ Step 4 — Verify
Actual population growth:
Using GM (6.30% every year):
Both match perfectly!
🔁 Revisiting the ₹100 Problem (Now With GM)
+100% and -50%:
GM correctly said 0% return. Arithmetic mean had wrongly said +25%.
🐍 Python Code With Explanation
from scipy.stats import gmean
# Step 1: Write returns as multipliers returns = [1.10, 0.80, 1.30] # +10%, -20%, +30%
# Step 2: gmean multiplies all and takes nth root automatically gm = gmean(returns)
# Step 3: Convert back to percentage print(f"Geometric Mean : {gm:.4f}") # 1.0456 print(f"Avg return/year: {(gm - 1) * 100:.2f}%") # 4.56%
📌 When to Use — Quick Reference
|
Situation |
Correct Mean |
|
Average marks, height, weight |
Arithmetic Mean ✅ |
|
Investment / stock returns |
Geometric Mean ✅ |
|
Population growth |
Geometric Mean ✅ |
|
Any % change over time |
Geometric Mean ✅ |
|
Each value builds on previous |
Geometric Mean ✅ |
🎯 One Line Summary
Whenever money or any quantity grows on top of the previous result (compounding), always use Geometric Mean — Arithmetic Mean will give you a wrong answer.
4. Harmonic Mean
Harmonic Mean is used when values are related to speed, rate, or “per unit” things.
👉 Like:
- speed (km/h)
- price per item
- work per hour
What it actually means
It gives the true average when things are divided (not added or multiplied)
👉 Special case: When you travel the same distance with different speeds
Simple example idea
You go:
Half distance at 60 km/h
Half distance at 40 km/h
👉 Normal average = (60 + 40) / 2 = 50 ❌ WRONG
👉 Because time taken is different
👉 Harmonic Mean gives correct average speed
Why we need it
When values are rates (per unit)
When denominator matters (time, distance, etc.)
Gives real accurate result in such cases
In one line:
Harmonic mean is used to find the correct average when dealing with speeds or rates (per unit values).
Reciprocal of the arithmetic mean of reciprocals. Sounds complex — but the use case makes it click.
Formula:
Harmonic Mean = n / (1/x₁ + 1/x₂ + ... + 1/xₙ)
✅ Example 2 (Work Rate)
Two machines complete same work:
- Machine A → 6 hours
- Machine B → 12 hours
Step 1: Formula
Step 2: Solve
1/6 + 1/12 = (2 + 1) / 12 = 3/12 = 1/4
Step 3: Final
2 ÷ (1/4) = 8 hours
👉 Final Answer: Average time = 8 hours
Example:
from scipy.stats import hmean
values = [4, 1] h_mean = hmean(values) print(h_mean) # 1.6
The most important use in ML — F1 Score:
precision = 0.80
recall = 0.60
# F1 Score IS the harmonic mean of precision and recall f1 = 2 * (precision * recall) / (precision + recall) print(f1) # 0.686
# Why harmonic and not arithmetic? # Arithmetic mean of 0.8 and 0.6 = 0.70 (too generous) # Harmonic mean punishes imbalance — if either is low, F1 is low
Used in: F1 Score, averaging rates, anywhere balance between two metrics matters.
5. Moving Average (Rolling Mean)
6. Exponential Moving Average (EMA)
Mean in Core ML Concepts
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean in Feature Scaling — Standardization (Z-score)
Before feeding data into ML models, you scale features. Mean is the center point.
Formula:
z = (x - mean) / standard_deviation
from sklearn.preprocessing import StandardScaler
data = [[25], [30], [35], [40], [45]]
scaler = StandardScaler()
scaled = scaler.fit_transform(data)
print(scaled)
# After scaling: mean becomes 0, std becomes 1
# [-1.41, -0.71, 0.0, 0.71, 1.41]
Why? Algorithms like Linear Regression, SVM, KNN, Neural Networks assume features are on similar scales. Without this, the feature with larger numbers dominates unfairly.
Mean in Gradient Descent
When you train a model, the loss function uses mean over all training examples.
Loss = (1/n) × Σ (predicted - actual)²
The gradient (direction to update weights) is also the mean of gradients across all samples. The model learns by minimizing this average error.
Mean Imputation (Handling Missing Data)
When data has missing values, a simple strategy is to fill them with the mean of that column.
import pandas as pd
import numpy as np
df = pd.DataFrame({'age': [25, 30, np.nan, 40, np.nan, 35]})
mean_age = df['age'].mean() # 32.5
df['age'].fillna(mean_age, inplace=True)
print(df)
# NaN values replaced with 32.5
When to use: Works well when data is roughly normally distributed and not too many values are missing.
Mean in Batch Normalization (Neural Networks)
Inside deep neural networks, after each layer, the activations are normalized using mean and standard deviation of the current batch. This keeps training stable and fast.
import torch
import torch.nn as nn
# PyTorch example
bn = nn.BatchNorm1d(num_features=4)
x = torch.tensor([[1.0, 2.0, 3.0, 4.0],
[5.0, 6.0, 7.0, 8.0]])
output = bn(x)
# Internally: subtracts mean, divides by std, for each feature
Mean vs Median — When Mean Fails
Mean has one big weakness: outliers destroy it.
salaries = [30000, 32000, 31000, 29000, 500000] # one billionaire in the group
mean_salary = np.mean(salaries) # 124,400 ← completely misleading
median_salary = np.median(salaries) # 31,000 ← represents the group better
Rule of thumb:
- Data has no extreme outliers → use Mean
- Data has outliers or is skewed → use Median
- Always check with a histogram or box plot before deciding
Quick Reference Summary
Type Formula ML Use Case
─────────────────────────────────────────────────────────────────────
Arithmetic Mean sum / n Loss functions, scaling
Weighted Mean Σ(wᵢxᵢ) / Σwᵢ Ensembles, class weights
Geometric Mean (x₁×x₂×...×xₙ)^(1/n) Growth rates, log-scale eval
Harmonic Mean n / Σ(1/xᵢ) F1 Score, rate averaging
Rolling Mean Mean of last k values Time series smoothing
EMA Weighted recent average Adam optimizer, forecasting
MAE mean(|actual - pred|) Regression evaluation
MSE mean((actual - pred)²) Regression loss function
RMSE √MSE Regression evaluation
One-Line Memory Hook for Each
- Arithmetic → "The everyday average"
- Weighted → "Some things matter more"
- Geometric → "For growth and multiplication"
- Harmonic → "For rates and balance — F1 lives here"
- Rolling → "Sliding window over time"
- EMA → "Recent past matters more"
- MAE → "Average of how wrong you were"
- MSE → "Punish big mistakes harder"
- RMSE → "MSE in original units"
That's the complete Mean chapter — from the basic definition all the way to how it powers neural network training, model evaluation, and data preprocessing in real ML pipelines.
