Mean Squared Error (MSE)

Start With The Problem — Where MAE Falls Short

Same house price example. Two different models:

House	Actual	Model A Predicted	Model B Predicted
1	50	49	50
2	80	79	80
3	60	59	60
4	90	79	89
5	70	69	30

Model A errors: 1, 1, 1, 11, 1 → MAE = 3.0

Model B errors: 0, 0, 0, 1, 40 → MAE = 8.2

Ok so MAE clearly shows Model B is worse here. But now imagine both models had same MAE = 5.

Model A errors: 5, 5, 5, 5, 5 → MAE = 5

Model B errors: 1, 1, 1, 1, 21 → MAE = 5 ← one massive mistake!

MAE says both are equal. But clearly Model B is dangerous — it made one huge blunder. You want to catch and punish big errors harder.

That's exactly what MSE does.

What is MSE?

MSE = Average of SQUARED differences between actual and predicted values

Same 3 steps as MAE but with one change:

Find the error (Actual − Predicted)
Square each error (instead of absolute value)
Take the average

Squaring does two things — makes everything positive AND punishes big errors much harder.

The Formula

MSE = (1/n) × Σ (Actual - Predicted)²

Same as MAE formula, just square instead of absolute value.

Manual Walkthrough — Step by Step

House	Actual	Predicted	Error	Error²
1	50	45	5	25
2	80	85	-5	25
3	60	58	2	4
4	90	95	-5	25
5	70	65	5	25

Step 1 — Sum of squared errors:

25 + 25 + 4 + 25 + 25 = 104

Step 2 — Divide by n (5 houses):

MSE = 104 / 5 = 20.8

Result: MSE = 20.8

The Squaring Effect — This is the KEY idea

See what squaring does to errors of different sizes:

Error	After Absolute (MAE)	After Squaring (MSE)
1	1	1
2	2	4
5	5	25
10	10	100
20	20	400

A 2x bigger error gets 4x bigger penalty in MSE.

A 10x bigger error gets 100x bigger penalty in MSE.

This is why MSE is said to heavily penalize large errors. Small errors barely matter, big errors scream loudly.

Visual — MAE vs MSE on Same Error

Error = 1  →  MAE adds 1    |  MSE adds 1
Error = 2  →  MAE adds 2    |  MSE adds 4
Error = 5  →  MAE adds 5    |  MSE adds 25
Error = 10 →  MAE adds 10   |  MSE adds 100  ← huge jump
Error = 20 →  MAE adds 20   |  MSE adds 400  ← MSE going crazy

Model with one huge mistake will have a massive MSE even if all other predictions are perfect.

Python Program


    import numpy as np
    import pandas as pd
    from sklearn.metrics import mean_squared_error
    import matplotlib.pyplot as plt

    # --- Data ---
    actual    = [50, 80, 60, 90, 70]
    predicted = [45, 85, 58, 95, 65]

    # --- Manual Calculation ---
    errors         = [a - p for a, p in zip(actual, predicted)]
    squared_errors = [e**2 for e in errors]
    mse_manual     = sum(squared_errors) / len(squared_errors)

    print("=== Manual Calculation ===")
    print(f"Errors         : {errors}")
    print(f"Squared Errors : {squared_errors}")
    print(f"MSE (manual)   : {mse_manual}")

    # --- Using NumPy ---
    mse_numpy = np.mean((np.array(actual) - np.array(predicted))**2)
    print(f"\nMSE (numpy)    : {mse_numpy}")

    # --- Using Scikit-learn ---
    mse_sklearn = mean_squared_error(actual, predicted)
    print(f"MSE (sklearn)  : {mse_sklearn}")

    # --- DataFrame view ---
    df = pd.DataFrame({
        'Actual'         : actual,
        'Predicted'      : predicted,
        'Error'          : errors,
        'Squared Error'  : squared_errors
    })
    print(f"\n{df.to_string(index=False)}")

    # --- Comparing MAE vs MSE on a bad outlier model ---
    actual2    = [50, 80, 60, 90, 70]
    predicted2 = [50, 80, 60, 89, 30]   # last prediction is way off

    from sklearn.metrics import mean_absolute_error
    print("\n=== Outlier Effect Comparison ===")
    print(f"Normal Model  → MAE: {mean_absolute_error(actual, predicted):.2f}  | MSE: {mean_squared_error(actual, predicted):.2f}")
    print(f"Outlier Model → MAE: {mean_absolute_error(actual2, predicted2):.2f} | MSE: {mean_squared_error(actual2, predicted2):.2f}")

    # --- Plot ---
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    # Plot 1 - Actual vs Predicted
    axes[0].plot(range(1, 6), actual,    label='Actual',    marker='o', linewidth=2)
    axes[0].plot(range(1, 6), predicted, label='Predicted', marker='s', linewidth=2, linestyle='--')
    for i in range(5):
        axes[0].vlines(i+1, min(actual[i], predicted[i]),
                            max(actual[i], predicted[i]),
                            colors='red', linewidth=2, alpha=0.6)
    axes[0].set_title('Actual vs Predicted')
    axes[0].set_xlabel('House')
    axes[0].set_ylabel('Price (Lakhs)')
    axes[0].legend()
    axes[0].grid(True)

    # Plot 2 - Squared errors bar chart
    axes[1].bar(range(1, 6), squared_errors, color='tomato', alpha=0.8)
    axes[1].set_title(f'Squared Errors per House (MSE = {mse_sklearn})')
    axes[1].set_xlabel('House')
    axes[1].set_ylabel('Squared Error')
    axes[1].grid(True, axis='y')

    plt.tight_layout()
    plt.savefig('mse_plot.png')
    plt.show()
    print("\nPlot saved!")


Output:

=== Manual Calculation ===
Errors         : [5, -5, 2, -5, 5]
Squared Errors : [25, 25, 4, 25, 25]
MSE (manual)   : 20.8

MSE (numpy)    : 20.8
MSE (sklearn)  : 20.8

 Actual  Predicted  Error  Squared Error
     50         45      5             25
     80         85     -5             25
     60         58      2              4
     90         95     -5             25
     70         65      5             25

=== Outlier Effect Comparison ===
Normal Model  → MAE: 4.40  | MSE: 20.80
Outlier Model → MAE: 8.20 | MSE: 320.20

Plot saved!

The Big Weakness of MSE — Units Get Weird

MAE of 4.4 Lakhs → means model is wrong by ₹4.4L on average. Easy to understand.

MSE of 20.8 → 20.8 what? Lakhs²? That unit makes no real sense.

Price is in Lakhs
Error is in Lakhs
Error² is in Lakhs² ← no human thinks in Lakhs squared

This is MSE's main weakness — not directly interpretable in real world terms.

Solution → RMSE (Root Mean Squared Error) — just take square root of MSE, unit comes back to normal. That's the next concept.

MAE vs MSE — Clear Comparison

	MAE	MSE
Formula	Average of \|errors\|	Average of errors²
Unit	Same as target ✅	Squared unit ❌
Big error penalty	Equal weight	Very heavy penalty ✅
Outlier sensitive	No — robust	Yes — very sensitive
Use when	Outliers exist in data	Big mistakes are unacceptable
Differentiable	Not always	Always ✅ (good for gradient descent)

Why MSE is Loved in ML Training

This is important — MSE is not just an evaluation metric. It's often used as the loss function during model training itself.


    # Linear Regression internally minimizes MSE during training
    from sklearn.linear_model import LinearRegression

    model = LinearRegression()  # this model minimizes MSE by default
    model.fit(X_train, y_train)

Why? Because MSE is smooth and differentiable everywhere — gradient descent can flow through it cleanly. MAE has a kink at 0 which causes issues in optimization.

Real ML Project Usage


    from sklearn.metrics import mean_squared_error, mean_absolute_error
    import numpy as np

    y_pred = model.predict(X_test)

    mae  = mean_absolute_error(y_test, y_pred)
    mse  = mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse)   # fix the unit problem

    print(f"MAE  : {mae:.2f}")   # avg error, human readable
    print(f"MSE  : {mse:.2f}")   # penalizes big errors, but weird unit
    print(f"RMSE : {rmse:.2f}")  # best of both worlds

    # Always report all three together in real projects

In real projects — always calculate MAE, MSE, and RMSE together. Each tells you something slightly different about your model's behavior.

One Line Summary

MSE squares every error before averaging — making it extremely sensitive to large mistakes, which is perfect when big prediction errors are costly and unacceptable in your use case.

CodeWithGagan | Programming Language and IT Lectures

Lectures