CodeWithGagan | Programming Language and IT Lectures

Mean Squared Error (MSE)

Start With The Problem — Where MAE Falls Short

Same house price example. Two different models:

House	Actual	Model A Predicted	Model B Predicted
1	50	49	50
2	80	79	80
3	60	59	60
4	90	79	89
5	70	69	30

Model A errors: 1, 1, 1, 11, 1 → MAE = 3.0

Model B errors: 0, 0, 0, 1, 40 → MAE = 8.2

Ok so MAE clearly shows Model B is worse here. But now imagine both models had same MAE = 5.

Model A errors: 5, 5, 5, 5, 5 → MAE = 5

Model B errors: 1, 1, 1, 1, 21 → MAE = 5 ← one massive mistake!

MAE says both are equal. But clearly Model B is dangerous — it made one huge blunder. You want to catch and punish big errors harder.

That's exactly what MSE does.

What is MSE?

MSE = Average of SQUARED differences between actual and predicted values

Same 3 steps as MAE but with one change:

Find the error (Actual − Predicted)
Square each error (instead of absolute value)
Take the average

Squaring does two things — makes everything positive AND punishes big errors much harder.

The Formula

MSE = (1/n) × Σ (Actual - Predicted)²

Same as MAE formula, just square instead of absolute value.

Manual Walkthrough — Step by Step

House	Actual	Predicted	Error	Error²
1	50	45	5	25
2	80	85	-5	25
3	60	58	2	4
4	90	95	-5	25
5	70	65	5	25

Step 1 — Sum of squared errors:

25 + 25 + 4 + 25 + 25 = 104

Step 2 — Divide by n (5 houses):

MSE = 104 / 5 = 20.8

Result: MSE = 20.8

The Squaring Effect — This is the KEY idea

See what squaring does to errors of different sizes:

Error	After Absolute (MAE)	After Squaring (MSE)
1	1	1
2	2	4
5	5	25
10	10	100
20	20	400

A 2x bigger error gets 4x bigger penalty in MSE.

A 10x bigger error gets 100x bigger penalty in MSE.

This is why MSE is said to heavily penalize large errors. Small errors barely matter, big errors scream loudly.

Visual — MAE vs MSE on Same Error

Error = 1  →  MAE adds 1    |  MSE adds 1
Error = 2  →  MAE adds 2    |  MSE adds 4
Error = 5  →  MAE adds 5    |  MSE adds 25
Error = 10 →  MAE adds 10   |  MSE adds 100  ← huge jump
Error = 20 →  MAE adds 20   |  MSE adds 400  ← MSE going crazy

Model with one huge mistake will have a massive MSE even if all other predictions are perfect.

Python Program


    import numpy as np
    import pandas as pd
    from sklearn.metrics import mean_squared_error
    import matplotlib.pyplot as plt

    # --- Data ---
    actual    = [50, 80, 60, 90, 70]
    predicted = [45, 85, 58, 95, 65]

    # --- Manual Calculation ---
    errors         = [a - p for a, p in zip(actual, predicted)]
    squared_errors = [e**2 for e in errors]
    mse_manual     = sum(squared_errors) / len(squared_errors)

    print("=== Manual Calculation ===")
    print(f"Errors         : {errors}")
    print(f"Squared Errors : {squared_errors}")
    print(f"MSE (manual)   : {mse_manual}")

    # --- Using NumPy ---
    mse_numpy = np.mean((np.array(actual) - np.array(predicted))**2)
    print(f"\nMSE (numpy)    : {mse_numpy}")

    # --- Using Scikit-learn ---
    mse_sklearn = mean_squared_error(actual, predicted)
    print(f"MSE (sklearn)  : {mse_sklearn}")

    # --- DataFrame view ---
    df = pd.DataFrame({
        'Actual'         : actual,
        'Predicted'      : predicted,
        'Error'          : errors,
        'Squared Error'  : squared_errors
    })
    print(f"\n{df.to_string(index=False)}")

    # --- Comparing MAE vs MSE on a bad outlier model ---
    actual2    = [50, 80, 60, 90, 70]
    predicted2 = [50, 80, 60, 89, 30]   # last prediction is way off

    from sklearn.metrics import mean_absolute_error
    print("\n=== Outlier Effect Comparison ===")
    print(f"Normal Model  → MAE: {mean_absolute_error(actual, predicted):.2f}  | MSE: {mean_squared_error(actual, predicted):.2f}")
    print(f"Outlier Model → MAE: {mean_absolute_error(actual2, predicted2):.2f} | MSE: {mean_squared_error(actual2, predicted2):.2f}")

    # --- Plot ---
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    # Plot 1 - Actual vs Predicted
    axes[0].plot(range(1, 6), actual,    label='Actual',    marker='o', linewidth=2)
    axes[0].plot(range(1, 6), predicted, label='Predicted', marker='s', linewidth=2, linestyle='--')
    for i in range(5):
        axes[0].vlines(i+1, min(actual[i], predicted[i]),
                            max(actual[i], predicted[i]),
                            colors='red', linewidth=2, alpha=0.6)
    axes[0].set_title('Actual vs Predicted')
    axes[0].set_xlabel('House')
    axes[0].set_ylabel('Price (Lakhs)')
    axes[0].legend()
    axes[0].grid(True)

    # Plot 2 - Squared errors bar chart
    axes[1].bar(range(1, 6), squared_errors, color='tomato', alpha=0.8)
    axes[1].set_title(f'Squared Errors per House (MSE = {mse_sklearn})')
    axes[1].set_xlabel('House')
    axes[1].set_ylabel('Squared Error')
    axes[1].grid(True, axis='y')

    plt.tight_layout()
    plt.savefig('mse_plot.png')
    plt.show()
    print("\nPlot saved!")


Output:

=== Manual Calculation ===
Errors         : [5, -5, 2, -5, 5]
Squared Errors : [25, 25, 4, 25, 25]
MSE (manual)   : 20.8

MSE (numpy)    : 20.8
MSE (sklearn)  : 20.8

 Actual  Predicted  Error  Squared Error
     50         45      5             25
     80         85     -5             25
     60         58      2              4
     90         95     -5             25
     70         65      5             25

=== Outlier Effect Comparison ===
Normal Model  → MAE: 4.40  | MSE: 20.80
Outlier Model → MAE: 8.20 | MSE: 320.20

Plot saved!

The Big Weakness of MSE — Units Get Weird

MAE of 4.4 Lakhs → means model is wrong by ₹4.4L on average. Easy to understand.

MSE of 20.8 → 20.8 what? Lakhs²? That unit makes no real sense.

Price is in Lakhs
Error is in Lakhs
Error² is in Lakhs² ← no human thinks in Lakhs squared

This is MSE's main weakness — not directly interpretable in real world terms.

Solution → RMSE (Root Mean Squared Error) — just take square root of MSE, unit comes back to normal. That's the next concept.

MAE vs MSE — Clear Comparison

	MAE	MSE
Formula	Average of \|errors\|	Average of errors²
Unit	Same as target ✅	Squared unit ❌
Big error penalty	Equal weight	Very heavy penalty ✅
Outlier sensitive	No — robust	Yes — very sensitive
Use when	Outliers exist in data	Big mistakes are unacceptable
Differentiable	Not always	Always ✅ (good for gradient descent)

Why MSE is Loved in ML Training

This is important — MSE is not just an evaluation metric. It's often used as the loss function during model training itself.


    # Linear Regression internally minimizes MSE during training
    from sklearn.linear_model import LinearRegression

    model = LinearRegression()  # this model minimizes MSE by default
    model.fit(X_train, y_train)

Why? Because MSE is smooth and differentiable everywhere — gradient descent can flow through it cleanly. MAE has a kink at 0 which causes issues in optimization.

Real ML Project Usage


    from sklearn.metrics import mean_squared_error, mean_absolute_error
    import numpy as np

    y_pred = model.predict(X_test)

    mae  = mean_absolute_error(y_test, y_pred)
    mse  = mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse)   # fix the unit problem

    print(f"MAE  : {mae:.2f}")   # avg error, human readable
    print(f"MSE  : {mse:.2f}")   # penalizes big errors, but weird unit
    print(f"RMSE : {rmse:.2f}")  # best of both worlds

    # Always report all three together in real projects

In real projects — always calculate MAE, MSE, and RMSE together. Each tells you something slightly different about your model's behavior.

One Line Summary

MSE squares every error before averaging — making it extremely sensitive to large mistakes, which is perfect when big prediction errors are costly and unacceptable in your use case.

Mean Absolute Error (MAE)

Start With The Problem

You trained a house price prediction model. Now you want to know — how good is my model?

Your model made these predictions:

House	Actual Price	Predicted Price
1	₹50 L	₹45 L
2	₹80 L	₹85 L
3	₹60 L	₹58 L
4	₹90 L	₹95 L
5	₹70 L	₹65 L

You need one single number that tells you — on average, by how much is my model wrong?

That number is MAE.

What is MAE?

MAE = Average of absolute differences between actual and predicted values

Three steps only:

Find the error (Actual − Predicted) for each row
Make all errors positive (take absolute value)
Take the average

The Formula

MAE = (1/n) × Σ |Actual - Predicted|

n = total number of predictions
| | = absolute value (just remove the minus sign)
Σ = sum of everything

Manual Walkthrough — Step by Step

House	Actual	Predicted	Error (A-P)	\|Error\|
1	50	45	+5	5
2	80	85	-5	5
3	60	58	+2	2
4	90	95	-5	5
5	70	65	+5	5

Step 1 — Sum of absolute errors:

5 + 5 + 2 + 5 + 5 = 22

Step 2 — Divide by n (5 houses):

MAE = 22 / 5 = 4.4

Result: MAE = 4.4 Lakhs

This means — on average, your model is wrong by ₹4.4 Lakhs per house. Simple and clear.

Why Absolute Value? Why Not Just Average the Errors?

Without absolute value:

Errors = +5, -5, +2, -5, +5
Sum    = +5 - 5 + 2 - 5 + 5 = 2
Avg    = 2 / 5 = 0.4

This says model is almost perfect — but it's clearly not! Positive and negative errors cancel each other out and give a false picture.

Absolute value fixes this — every error counts as positive, no cancellation.

Python Program


    import numpy as np
    import pandas as pd
    from sklearn.metrics import mean_absolute_error
    import matplotlib.pyplot as plt

    # --- Data ---
    actual    = [50, 80, 60, 90, 70]
    predicted = [45, 85, 58, 95, 65]

    # --- Manual Calculation ---
    errors          = [a - p for a, p in zip(actual, predicted)]
    absolute_errors = [abs(e) for e in errors]
    mae_manual      = sum(absolute_errors) / len(absolute_errors)

    print("=== Manual Calculation ===")
    print(f"Errors          : {errors}")
    print(f"Absolute Errors : {absolute_errors}")
    print(f"MAE (manual)    : {mae_manual}")

    # --- Using NumPy ---
    mae_numpy = np.mean(np.abs(np.array(actual) - np.array(predicted)))
    print(f"\nMAE (numpy)     : {mae_numpy}")

    # --- Using Scikit-learn ---
    mae_sklearn = mean_absolute_error(actual, predicted)
    print(f"MAE (sklearn)   : {mae_sklearn}")

    # --- DataFrame view ---
    df = pd.DataFrame({
        'Actual'         : actual,
        'Predicted'      : predicted,
        'Error'          : errors,
        'Absolute Error' : absolute_errors
    })
    print(f"\n{df.to_string(index=False)}")

    # --- Plot ---
    x = range(1, len(actual) + 1)
    plt.figure(figsize=(10, 5))
    plt.plot(x, actual,    label='Actual',    marker='o', linewidth=2)
    plt.plot(x, predicted, label='Predicted', marker='s', linewidth=2, linestyle='--')

    for i in x:
        plt.vlines(i, min(actual[i-1], predicted[i-1]),
                    max(actual[i-1], predicted[i-1]),
                    colors='red', linewidth=2, alpha=0.6)

    plt.title(f'Actual vs Predicted (MAE = {mae_sklearn})')
    plt.xlabel('House')
    plt.ylabel('Price (Lakhs)')
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.savefig('mae_plot.png')
    plt.show()
    print("\nPlot saved!")

Output:

=== Manual Calculation ===

Errors : [5, -5, 2, -5, 5]

Absolute Errors : [5, 5, 2, 5, 5]

MAE (manual) : 4.4

MAE (numpy) : 4.4

MAE (sklearn) : 4.4

Actual Predicted Error Absolute Error

50 45 5 5

80 85 -5 5

60 58 2 2

90 95 -5 5

70 65 5 5

Plot saved!

The red vertical lines in the plot show the error for each prediction — MAE is just the average length of those red lines.

How to Read MAE

MAE is in the same unit as your target variable.

Target	MAE = 4.4 means
House Price (Lakhs)	Wrong by ₹4.4L on average
Temperature (°C)	Wrong by 4.4°C on average
Sales (units)	Wrong by 4.4 units on average

This is MAE's biggest strength — it's directly interpretable. No unit conversion needed.

MAE vs Other Metrics — When to Use What

Metric	Penalizes Big Errors?	Interpretable?	Use When
MAE	No (equal weight)	✅ Yes, same unit	Outliers exist, you want simple average error
MSE	Yes (squares errors)	❌ Squared unit	Big errors are very bad, want to penalize them hard
RMSE	Yes	✅ Yes, same unit	Big errors are bad but want interpretable result

The One Weakness of MAE

MAE treats all errors equally. A ₹2L error and a ₹20L error both just get added as-is.

Error of 2  → contributes 2
Error of 20 → contributes 20

If you want your model to heavily penalize large mistakes, use MSE or RMSE instead. But if outliers exist in your data and you don't want them to dominate the metric — MAE is safer.

Real ML Project Usage


    from sklearn.metrics import mean_absolute_error
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import train_test_split

    # After training your model
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    model = LinearRegression()
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)

    # Evaluate
    mae = mean_absolute_error(y_test, y_pred)
    print(f"Model MAE: {mae:.2f}")

    # Rule of thumb — MAE should be small relative to your target's range
    print(f"Target range : {y_test.max() - y_test.min():.2f}")
    print(f"MAE as % of range: {(mae / (y_test.max() - y_test.min())) * 100:.1f}%")

MAE as % of range is a great sanity check — if MAE is 5% of the range, your model is decent. If it's 40%, your model needs work.

One Line Summary

MAE tells you — on average, by how much is your model's prediction off from the real value — in the same unit as your data, making it the most human-readable regression metric.

Mean Squared Error (MSE)

Start With The Problem — Where MAE Falls Short

What is MSE?

The Formula

Manual Walkthrough — Step by Step

The Squaring Effect — This is the KEY idea

Visual — MAE vs MSE on Same Error

Python Program

The Big Weakness of MSE — Units Get Weird

MAE vs MSE — Clear Comparison

Why MSE is Loved in ML Training

Real ML Project Usage

One Line Summary

Mean Absolute Error (MAE)

Start With The Problem

What is MAE?

The Formula

Manual Walkthrough — Step by Step

Why Absolute Value? Why Not Just Average the Errors?

Python Program

How to Read MAE

MAE vs Other Metrics — When to Use What

The One Weakness of MAE

Real ML Project Usage

One Line Summary

Mean — Complete Chapter for ML & Statistics