The Problem with SMA First
Remember SMA — it gives equal weight to all values in the window.
For a 3-day SMA on sales:
Day 1: 200, Day 2: 450, Day 3: 180
SMA = (200 + 450 + 180) / 3 = 276.6
Here, Day 1 (old data) and Day 3 (today) are treated equally. But think about it — should 2-day-old data matter as much as today's data?
In most real cases — No. Recent data is more important.
That's exactly what EMA fixes.
What is EMA?
EMA gives MORE weight to recent values and LESS weight to older values.
The further back a value is, the less it influences the average. Recent values dominate.
Weight Concept — Simple Visual
For a 3-period EMA, weights look like this:
|
Data Point |
Weight |
|
Today (most recent) |
Highest ⬆️ |
|
Yesterday |
Medium |
|
Day before |
Low |
|
Even older |
Very Low (almost ignored) |
Compare this to SMA where every day gets exactly equal weight.
The Formula
EMA today = (Today's Value × α) + (Yesterday's EMA × (1 - α))
Where α (alpha) is the smoothing factor:
α = 2 / (N + 1)
For N = 3:
α = 2 / (3 + 1) = 0.5
That means — 50% weight to today, 50% to the past EMA.
For N = 10:
α = 2 / (10 + 1) = 0.18
Smaller alpha = smoother = older data still matters more.
Manual Walkthrough — Step by Step
Daily Sales data:
|
Day |
Sales |
|
1 |
200 |
|
2 |
450 |
|
3 |
180 |
|
4 |
500 |
|
5 |
220 |
Using N = 3, so α = 0.5
Step 1 — Day 1: No previous EMA exists, so EMA = first value itself
EMA(1) = 200
Step 2 — Day 2:
EMA(2) = (450 × 0.5) + (200 × 0.5)
= 225 + 100
= 325
Step 3 — Day 3:
EMA(3) = (180 × 0.5) + (325 × 0.5)
= 90 + 162.5
= 252.5
Step 4 — Day 4:
EMA(4) = (500 × 0.5) + (252.5 × 0.5)
= 250 + 126.25
= 376.25
Step 5 — Day 5:
EMA(5) = (220 × 0.5) + (376.25 × 0.5)
= 110 + 188.12
= 298.12
Final result:
| Day | Sales | EMA (N=3) |
|---|---|---|
| 1 | 200 | 200 |
| 2 | 450 | 325 |
| 3 | 180 | 252.5 |
| 4 | 500 | 376.25 |
| 5 | 220 | 298.12 |
Notice — EMA reacts faster to the spike on Day 4 (500) compared to SMA. That's the power.
SMA vs EMA — Side by Side
|
Feature |
SMA |
EMA |
|
Weight to all values |
Equal |
More to recent |
|
Reacts to sudden change |
Slow |
Fast |
|
Smoother line |
Yes |
Slightly less smooth |
|
NaN at start |
Yes (first N rows) |
No |
|
Best for |
Long-term trend |
Short-term, fast signals |
Python Program
import pandas as pd
import matplotlib.pyplot as plt
# --- Data ---
data = {
'day': list(range(1, 16)),
'sales': [200, 450, 180, 500, 220, 480, 210, 460, 190, 510, 230, 490, 200, 470, 215]
}
df = pd.DataFrame(data)
# --- Calculate SMA and EMA ---
df['SMA_3'] = df['sales'].rolling(window=3).mean()
df['EMA_3'] = df['sales'].ewm(span=3, adjust=False).mean() # EMA with N=3
df['EMA_7'] = df['sales'].ewm(span=7, adjust=False).mean() # EMA with N=7
print(df.to_string(index=False))
# --- Plot ---
plt.figure(figsize=(12, 5))
plt.plot(df['day'], df['sales'], label='Raw Sales', marker='o', linewidth=1.5, alpha=0.6)
plt.plot(df['day'], df['SMA_3'], label='SMA (3-day)', linewidth=2, linestyle='--')
plt.plot(df['day'], df['EMA_3'], label='EMA (3-day)', linewidth=2)
plt.plot(df['day'], df['EMA_7'], label='EMA (7-day)', linewidth=2)
plt.title('SMA vs EMA Comparison')
plt.xlabel('Day')
plt.ylabel('Sales')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.savefig('ema_vs_sma.png')
plt.show()
print("Plot saved!")
Key Things to Remember
ewm(span=3) — span is your N value, same as window in rolling
adjust=False — uses the recursive formula shown above (standard EMA). Always use this.
No NaN — EMA starts from Day 1 itself, unlike SMA which waits for N values
Where EMA is Used in ML
# Feature Engineering with EMA
df['ema_3'] = df['sales'].ewm(span=3, adjust=False).mean() # short trend
df['ema_7'] = df['sales'].ewm(span=7, adjust=False).mean() # medium trend
df['ema_21'] = df['sales'].ewm(span=21, adjust=False).mean() # long trend
# EMA reacts faster — great for detecting sudden changes (fraud, anomaly)
df['deviation_from_ema'] = df['sales'] - df['ema_7'] # how far today is from trend
deviation_from_ema is a very powerful feature — if this value is very high or very low, it signals something unusual happening. Used heavily in anomaly detection and fraud detection models.
One Line Summary
EMA is a smarter Moving Average — it remembers the past but pays more attention to what just happened, making it faster to react to real changes in data.
No comments:
Post a Comment