A Sunny Day, Broken Data, and You Had No Idea
1. A Regular Morning at startup
At 8 AM, you open the tracking dashboard like always.
All ETL jobs from the previous day show success, at least according to the latest report.
Then Slack starts buzzing.
The PM sends alerts:
“Hey, why is all the TikTok data missing from yesterday’s report?”
“And what’s with Tuesday’s campaign revenue showing as zero?”
You open the ETL logs, check Snowflake, dig through S3, and spot something odd.
It seems the pipeline ran successfully, but the data was incomplete, and no one noticed.
Maybe the data had been wrong for days, and the system didn’t say a word.
And so the story begins…
2. What’s “Normal”? What’s Not?
In data systems, an anomaly is any value that breaks from the usual flow.
Point anomaly: Revenue = 0 on a day that normally hits hundreds of millions.
Contextual anomaly: Clicks on Friday drop below Wednesday’s, which reverses the typical trend.
Collective anomaly: Traffic drops 20% for five straight days, but no one notices because each day looks “fine” in isolation.
The key question: How do you know when a data point is abnormal, if no one tells you?
3. Bad Data Isn’t the Scary Part, Not Knowing It’s Bad Is
As your system scales:
from a few tables to hundreds,
from one pipeline to dozens,
from a single data source to many,
… you can no longer rely on visual checks or user complaints.
You need a digital sixth sense, something that constantly watches your data and knows when it strays from expectations.
4. How Can You Detect Anomalies?
Manual Rules:
“If record count < 1,000, send an alert.”
“If today’s revenue deviates by more than 50% from yesterday’s, send a Slack message.”
But seasonality exists, traffic drops every Sunday, orders spike at month-end.
Static rules easily misfire or miss the issue altogether.
Statistical Methods:
Z-score: Measures how far a point is from the mean.
Standard deviation: Flags values outside normal fluctuation ranges.
Rolling average: Compares recent data to smooth short-term noise.
Exponential smoothing: Gives more weight to recent data to track short-term changes.
Smarter Techniques:
Holt-Winters: Models trend and seasonality for more accurate predictions.
Isolation Forest, Autoencoder, Prophet: Use machine learning to detect complex patterns.
5. A Practical Example
Let’s take one example, daily Facebook Ads clicks, and test it with different anomaly detection methods.
The goal: see what each method can catch, and where it falls short.
The Context:
You’re tracking daily click counts for the past 21 days:
Day | Clicks |
---|---|
Mon W1 | 4100 |
Tue W1 | 4300 |
Wed W1 | 4350 |
Thu W1 | 4200 |
Fri W1 | 4400 |
Sat W1 | 2800 |
Sun W1 | 2500 |
Mon W2 | 4150 |
Tue W2 | 4350 |
Wed W2 | 4400 |
Thu W2 | 4300 |
Fri W2 | 4450 |
Sat W2 | 2700 |
Sun W2 | 2600 |
Mon W3 | 4180 |
Tue W3 | 4320 |
Wed W3 | 3450 ❗️– unusually low |
Thu W3 | 4290 |
Fri W3 | 4430 |
Sat W3 | 2750 |
Sun W3 | 2580 |
Goal:
Is Wednesday Week 3 (3450 clicks) an anomaly?
1. Overall Mean
Overall mean ≈ 3800 clicks
3450 is below the mean by ~350 clicks
But it’s still higher than Saturday or Sunday
Limitation: Doesn’t account for the fact that Wednesdays are usually high
Strength: Easiest to implement
2. Recent Mean (Last 2–3 Days)
Monday W3: 4180
Tuesday W3: 4320
Average = 4250
Wednesday W3 = 3450 → ~800 clicks lower (~19%)
Strength: Detects short-term shifts
Limitation: Ignores day-of-week patterns, so it might misread normal variations
3. Standard Deviation (Same Weekday)
Wednesdays:
W1: 4350
W2: 4400
W3: 3450
Mean = 4400
Deviation ≈ 950 (~21%)
Strength: High precision with stable patterns
Limitation: Sensitive to outliers
4. Exponential Smoothing
Predicts a smoothed value near 4380 for that day
3450 is far off → Flagged as anomaly
Strength: Smooths noise, responds to shifts
Limitation: Doesn’t understand weekday patterns
5. Holt-Winters (With Trend + Seasonality)
Learns that Wed–Fri are peak days
Predicts Wednesday clicks around 4370–4400
3450 = ~22% drop → Clear anomaly
Strength:
Recognizes normal high days
Ignores predictable lows (e.g., Sunday)
Knows when deviation is worth an alert
Limitation: Needs historical data and setup
Summary Comparison
Method | Detects Anomaly? | Aware of Day Patterns? | Accuracy |
---|---|---|---|
Overall Mean | No | No | Low |
Recent Mean | Somewhat | No | Medium |
Std Dev (Same Day) | Yes | Yes | High |
Smoothing | Yes | No | High |
Holt-Winters | Yes | Yes | Very High |
6. Final Thought – Let Your Data Tell You When Something’s Off
You can’t write a rule for every scenario.
You can’t watch dashboards all day.
You need a smart model that understands:
What today’s data should look like
Whether deviations are real problems
When to raise the alarm
That’s why Anomaly Detection exists, to guard your data systems like a silent sentry.
🔜 Next Up…
We’ll dive into one specific model: Holt-Winters, the hero for data with clear trends and cycles.
And we’ll use real Facebook Ads data to show how it works.