Lectures

Mean Squared Error (MSE)

Start With The Problem — Where MAE Falls Short

Same house price example. Two different models:

House

Actual

Model A Predicted

Model B Predicted

1

50

49

50

2

80

79

80

3

60

59

60

4

90

79

89

5

70

69

30


Model A errors: 1, 1, 1, 11, 1 → MAE = 3.0

Model B errors: 0, 0, 0, 1, 40 → MAE = 8.2

Ok so MAE clearly shows Model B is worse here. But now imagine both models had same MAE = 5.

Model A errors: 5, 5, 5, 5, 5 → MAE = 5

Model B errors: 1, 1, 1, 1, 21 → MAE = 5 ← one massive mistake!

MAE says both are equal. But clearly Model B is dangerous — it made one huge blunder. You want to catch and punish big errors harder.

That's exactly what MSE does.


What is MSE?

MSE = Average of SQUARED differences between actual and predicted values

Same 3 steps as MAE but with one change:

  1. Find the error (Actual − Predicted)
  2. Square each error (instead of absolute value)
  3. Take the average

Squaring does two things — makes everything positive AND punishes big errors much harder.


The Formula

MSE = (1/n) × Σ (Actual - Predicted)²

Same as MAE formula, just square instead of absolute value.


Manual Walkthrough — Step by Step

House

Actual

Predicted

Error

Error²

1

50

45

5

25

2

80

85

-5

25

3

60

58

2

4

4

90

95

-5

25

5

70

65

5

25

Step 1 — Sum of squared errors:

25 + 25 + 4 + 25 + 25 = 104

Step 2 — Divide by n (5 houses):

MSE = 104 / 5 = 20.8

Result: MSE = 20.8


The Squaring Effect — This is the KEY idea

See what squaring does to errors of different sizes:

Error

After Absolute (MAE)

After Squaring (MSE)

1

1

1

2

2

4

5

5

25

10

10

100

20

20

400

 A 2x bigger error gets 4x bigger penalty in MSE.

A 10x bigger error gets 100x bigger penalty in MSE.

This is why MSE is said to heavily penalize large errors. Small errors barely matter, big errors scream loudly.


Visual — MAE vs MSE on Same Error

Error = 1  →  MAE adds 1    |  MSE adds 1
Error = 2  →  MAE adds 2    |  MSE adds 4
Error = 5  →  MAE adds 5    |  MSE adds 25
Error = 10 →  MAE adds 10   |  MSE adds 100  ← huge jump
Error = 20 →  MAE adds 20   |  MSE adds 400  ← MSE going crazy

Model with one huge mistake will have a massive MSE even if all other predictions are perfect.


Python Program


    import numpy as np
    import pandas as pd
    from sklearn.metrics import mean_squared_error
    import matplotlib.pyplot as plt

    # --- Data ---
    actual    = [50, 80, 60, 90, 70]
    predicted = [45, 85, 58, 95, 65]

    # --- Manual Calculation ---
    errors         = [a - p for a, p in zip(actual, predicted)]
    squared_errors = [e**2 for e in errors]
    mse_manual     = sum(squared_errors) / len(squared_errors)

    print("=== Manual Calculation ===")
    print(f"Errors         : {errors}")
    print(f"Squared Errors : {squared_errors}")
    print(f"MSE (manual)   : {mse_manual}")

    # --- Using NumPy ---
    mse_numpy = np.mean((np.array(actual) - np.array(predicted))**2)
    print(f"\nMSE (numpy)    : {mse_numpy}")

    # --- Using Scikit-learn ---
    mse_sklearn = mean_squared_error(actual, predicted)
    print(f"MSE (sklearn)  : {mse_sklearn}")

    # --- DataFrame view ---
    df = pd.DataFrame({
        'Actual'         : actual,
        'Predicted'      : predicted,
        'Error'          : errors,
        'Squared Error'  : squared_errors
    })
    print(f"\n{df.to_string(index=False)}")

    # --- Comparing MAE vs MSE on a bad outlier model ---
    actual2    = [50, 80, 60, 90, 70]
    predicted2 = [50, 80, 60, 89, 30]   # last prediction is way off

    from sklearn.metrics import mean_absolute_error
    print("\n=== Outlier Effect Comparison ===")
    print(f"Normal Model  → MAE: {mean_absolute_error(actual, predicted):.2f}  | MSE: {mean_squared_error(actual, predicted):.2f}")
    print(f"Outlier Model → MAE: {mean_absolute_error(actual2, predicted2):.2f} | MSE: {mean_squared_error(actual2, predicted2):.2f}")

    # --- Plot ---
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    # Plot 1 - Actual vs Predicted
    axes[0].plot(range(1, 6), actual,    label='Actual',    marker='o', linewidth=2)
    axes[0].plot(range(1, 6), predicted, label='Predicted', marker='s', linewidth=2, linestyle='--')
    for i in range(5):
        axes[0].vlines(i+1, min(actual[i], predicted[i]),
                            max(actual[i], predicted[i]),
                            colors='red', linewidth=2, alpha=0.6)
    axes[0].set_title('Actual vs Predicted')
    axes[0].set_xlabel('House')
    axes[0].set_ylabel('Price (Lakhs)')
    axes[0].legend()
    axes[0].grid(True)

    # Plot 2 - Squared errors bar chart
    axes[1].bar(range(1, 6), squared_errors, color='tomato', alpha=0.8)
    axes[1].set_title(f'Squared Errors per House (MSE = {mse_sklearn})')
    axes[1].set_xlabel('House')
    axes[1].set_ylabel('Squared Error')
    axes[1].grid(True, axis='y')

    plt.tight_layout()
    plt.savefig('mse_plot.png')
    plt.show()
    print("\nPlot saved!")


Output:

=== Manual Calculation === Errors : [5, -5, 2, -5, 5] Squared Errors : [25, 25, 4, 25, 25] MSE (manual) : 20.8 MSE (numpy) : 20.8 MSE (sklearn) : 20.8 Actual Predicted Error Squared Error 50 45 5 25 80 85 -5 25 60 58 2 4 90 95 -5 25 70 65 5 25 === Outlier Effect Comparison === Normal Model → MAE: 4.40 | MSE: 20.80 Outlier Model → MAE: 8.20 | MSE: 320.20 Plot saved!


The Big Weakness of MSE — Units Get Weird

MAE of 4.4 Lakhs → means model is wrong by ₹4.4L on average. Easy to understand.

MSE of 20.8 → 20.8 what? Lakhs²? That unit makes no real sense.

Price is in Lakhs
Error is in Lakhs
Error² is in Lakhs² ← no human thinks in Lakhs squared

This is MSE's main weakness — not directly interpretable in real world terms.

Solution → RMSE (Root Mean Squared Error) — just take square root of MSE, unit comes back to normal. That's the next concept.


MAE vs MSE — Clear Comparison

MAE

MSE

Formula

Average of |errors|

Average of errors²

Unit

Same as target

Squared unit

Big error penalty

Equal weight

Very heavy penalty

Outlier sensitive

No — robust

Yes — very sensitive

Use when

Outliers exist in data

Big mistakes are unacceptable

Differentiable

Not always

Always (good for gradient descent)


Why MSE is Loved in ML Training

This is important — MSE is not just an evaluation metric. It's often used as the loss function during model training itself.


    # Linear Regression internally minimizes MSE during training
    from sklearn.linear_model import LinearRegression

    model = LinearRegression()  # this model minimizes MSE by default
    model.fit(X_train, y_train)

Why? Because MSE is smooth and differentiable everywhere — gradient descent can flow through it cleanly. MAE has a kink at 0 which causes issues in optimization.


Real ML Project Usage


    from sklearn.metrics import mean_squared_error, mean_absolute_error
    import numpy as np

    y_pred = model.predict(X_test)

    mae  = mean_absolute_error(y_test, y_pred)
    mse  = mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse)   # fix the unit problem

    print(f"MAE  : {mae:.2f}")   # avg error, human readable
    print(f"MSE  : {mse:.2f}")   # penalizes big errors, but weird unit
    print(f"RMSE : {rmse:.2f}")  # best of both worlds

    # Always report all three together in real projects

In real projects — always calculate MAE, MSE, and RMSE together. Each tells you something slightly different about your model's behavior.


One Line Summary

MSE squares every error before averaging — making it extremely sensitive to large mistakes, which is perfect when big prediction errors are costly and unacceptable in your use case.

No comments:

Post a Comment