Start With The Problem — Where MAE Falls Short
Same house price example. Two different models:
|
|---|
Model A errors: 1, 1, 1, 11, 1 → MAE = 3.0
Model B errors: 0, 0, 0, 1, 40 → MAE = 8.2
Ok so MAE clearly shows Model B is worse here. But now imagine both models had same MAE = 5.
Model A errors: 5, 5, 5, 5, 5 → MAE = 5
Model B errors: 1, 1, 1, 1, 21 → MAE = 5 ← one massive mistake!
MAE says both are equal. But clearly Model B is dangerous — it made one huge blunder. You want to catch and punish big errors harder.
That's exactly what MSE does.
What is MSE?
MSE = Average of SQUARED differences between actual and predicted values
Same 3 steps as MAE but with one change:
- Find the error (Actual − Predicted)
- Square each error (instead of absolute value)
- Take the average
Squaring does two things — makes everything positive AND punishes big errors much harder.
The Formula
MSE = (1/n) × Σ (Actual - Predicted)²
Same as MAE formula, just square instead of absolute value.
Manual Walkthrough — Step by Step
|
House |
Actual |
Predicted |
Error |
Error² |
|
1 |
50 |
45 |
5 |
25 |
|
2 |
80 |
85 |
-5 |
25 |
|
3 |
60 |
58 |
2 |
4 |
|
4 |
90 |
95 |
-5 |
25 |
|
5 |
70 |
65 |
5 |
25 |
Step 1 — Sum of squared errors:
25 + 25 + 4 + 25 + 25 = 104
Step 2 — Divide by n (5 houses):
MSE = 104 / 5 = 20.8
Result: MSE = 20.8
The Squaring Effect — This is the KEY idea
See what squaring does to errors of different sizes:
|
Error |
After
Absolute (MAE) |
After
Squaring (MSE) |
|
1 |
1 |
1 |
|
2 |
2 |
4 |
|
5 |
5 |
25 |
|
10 |
10 |
100 |
|
20 |
20 |
400 |
A 10x bigger error gets 100x bigger penalty in MSE.
This is why MSE is said to heavily penalize large errors. Small errors barely matter, big errors scream loudly.
Visual — MAE vs MSE on Same Error
Error = 1 → MAE adds 1 | MSE adds 1
Error = 2 → MAE adds 2 | MSE adds 4
Error = 5 → MAE adds 5 | MSE adds 25
Error = 10 → MAE adds 10 | MSE adds 100 ← huge jump
Error = 20 → MAE adds 20 | MSE adds 400 ← MSE going crazy
Model with one huge mistake will have a massive MSE even if all other predictions are perfect.
Python Program
import numpy as np import pandas as pd from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt
# --- Data --- actual = [50, 80, 60, 90, 70] predicted = [45, 85, 58, 95, 65]
# --- Manual Calculation --- errors = [a - p for a, p in zip(actual, predicted)] squared_errors = [e**2 for e in errors] mse_manual = sum(squared_errors) / len(squared_errors)
print("=== Manual Calculation ===") print(f"Errors : {errors}") print(f"Squared Errors : {squared_errors}") print(f"MSE (manual) : {mse_manual}")
# --- Using NumPy --- mse_numpy = np.mean((np.array(actual) - np.array(predicted))**2) print(f"\nMSE (numpy) : {mse_numpy}")
# --- Using Scikit-learn --- mse_sklearn = mean_squared_error(actual, predicted) print(f"MSE (sklearn) : {mse_sklearn}")
# --- DataFrame view --- df = pd.DataFrame({ 'Actual' : actual, 'Predicted' : predicted, 'Error' : errors, 'Squared Error' : squared_errors }) print(f"\n{df.to_string(index=False)}")
# --- Comparing MAE vs MSE on a bad outlier model --- actual2 = [50, 80, 60, 90, 70] predicted2 = [50, 80, 60, 89, 30] # last prediction is way off
from sklearn.metrics import mean_absolute_error print("\n=== Outlier Effect Comparison ===") print(f"Normal Model → MAE: {mean_absolute_error(actual, predicted):.2f} | MSE: {mean_squared_error(actual, predicted):.2f}") print(f"Outlier Model → MAE: {mean_absolute_error(actual2, predicted2):.2f} | MSE: {mean_squared_error(actual2, predicted2):.2f}")
# --- Plot --- fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Plot 1 - Actual vs Predicted axes[0].plot(range(1, 6), actual, label='Actual', marker='o', linewidth=2) axes[0].plot(range(1, 6), predicted, label='Predicted', marker='s', linewidth=2, linestyle='--') for i in range(5): axes[0].vlines(i+1, min(actual[i], predicted[i]), max(actual[i], predicted[i]), colors='red', linewidth=2, alpha=0.6) axes[0].set_title('Actual vs Predicted') axes[0].set_xlabel('House') axes[0].set_ylabel('Price (Lakhs)') axes[0].legend() axes[0].grid(True)
# Plot 2 - Squared errors bar chart axes[1].bar(range(1, 6), squared_errors, color='tomato', alpha=0.8) axes[1].set_title(f'Squared Errors per House (MSE = {mse_sklearn})') axes[1].set_xlabel('House') axes[1].set_ylabel('Squared Error') axes[1].grid(True, axis='y')
plt.tight_layout() plt.savefig('mse_plot.png') plt.show() print("\nPlot saved!")
Output:
=== Manual Calculation ===
Errors : [5, -5, 2, -5, 5]
Squared Errors : [25, 25, 4, 25, 25]
MSE (manual) : 20.8
MSE (numpy) : 20.8
MSE (sklearn) : 20.8
Actual Predicted Error Squared Error
50 45 5 25
80 85 -5 25
60 58 2 4
90 95 -5 25
70 65 5 25
=== Outlier Effect Comparison ===
Normal Model → MAE: 4.40 | MSE: 20.80
Outlier Model → MAE: 8.20 | MSE: 320.20
Plot saved!
The Big Weakness of MSE — Units Get Weird
MAE of 4.4 Lakhs → means model is wrong by ₹4.4L on average. Easy to understand.
MSE of 20.8 → 20.8 what? Lakhs²? That unit makes no real sense.
Price is in Lakhs
Error is in Lakhs
Error² is in Lakhs² ← no human thinks in Lakhs squared
This is MSE's main weakness — not directly interpretable in real world terms.
Solution → RMSE (Root Mean Squared Error) — just take square root of MSE, unit comes back to normal. That's the next concept.
MAE vs MSE — Clear Comparison
|
MAE |
MSE |
|
|
Formula |
Average of
|errors| |
Average of
errors² |
|
Unit |
Same as
target ✅ |
Squared
unit ❌ |
|
Big error
penalty |
Equal
weight |
Very heavy
penalty ✅ |
|
Outlier
sensitive |
No —
robust |
Yes — very
sensitive |
|
Use when |
Outliers
exist in data |
Big
mistakes are unacceptable |
|
Differentiable |
Not always |
Always ✅
(good for gradient descent) |
Why MSE is Loved in ML Training
This is important — MSE is not just an evaluation metric. It's often used as the loss function during model training itself.
# Linear Regression internally minimizes MSE during training from sklearn.linear_model import LinearRegression
model = LinearRegression() # this model minimizes MSE by default model.fit(X_train, y_train)
Why? Because MSE is smooth and differentiable everywhere — gradient descent can flow through it cleanly. MAE has a kink at 0 which causes issues in optimization.
Real ML Project Usage
from sklearn.metrics import mean_squared_error, mean_absolute_error import numpy as np
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred) mse = mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) # fix the unit problem
print(f"MAE : {mae:.2f}") # avg error, human readable print(f"MSE : {mse:.2f}") # penalizes big errors, but weird unit print(f"RMSE : {rmse:.2f}") # best of both worlds
# Always report all three together in real projects
In real projects — always calculate MAE, MSE, and RMSE together. Each tells you something slightly different about your model's behavior.
One Line Summary
MSE squares every error before averaging — making it extremely sensitive to large mistakes, which is perfect when big prediction errors are costly and unacceptable in your use case.

No comments:
Post a Comment