A Sunny Day, Broken Data, and You Had No Idea

Jun 3, 2025

1. A Regular Morning at startup

At 8 AM, you open the tracking dashboard like always.
All ETL jobs from the previous day show success, at least according to the latest report.
Then Slack starts buzzing.

The PM sends alerts:

“Hey, why is all the TikTok data missing from yesterday’s report?”
“And what’s with Tuesday’s campaign revenue showing as zero?”

You open the ETL logs, check Snowflake, dig through S3, and spot something odd.

It seems the pipeline ran successfully, but the data was incomplete, and no one noticed.
Maybe the data had been wrong for days, and the system didn’t say a word.

And so the story begins…

2. What’s “Normal”? What’s Not?

In data systems, an anomaly is any value that breaks from the usual flow.

Point anomaly: Revenue = 0 on a day that normally hits hundreds of millions.
Contextual anomaly: Clicks on Friday drop below Wednesday’s, which reverses the typical trend.
Collective anomaly: Traffic drops 20% for five straight days, but no one notices because each day looks “fine” in isolation.

The key question: How do you know when a data point is abnormal, if no one tells you?

3. Bad Data Isn’t the Scary Part, Not Knowing It’s Bad Is

As your system scales:

from a few tables to hundreds,
from one pipeline to dozens,
from a single data source to many,

… you can no longer rely on visual checks or user complaints.

You need a digital sixth sense, something that constantly watches your data and knows when it strays from expectations.

4. How Can You Detect Anomalies?

Manual Rules:

“If record count < 1,000, send an alert.”
“If today’s revenue deviates by more than 50% from yesterday’s, send a Slack message.”

But seasonality exists, traffic drops every Sunday, orders spike at month-end.
Static rules easily misfire or miss the issue altogether.

Statistical Methods:

Z-score: Measures how far a point is from the mean.
Standard deviation: Flags values outside normal fluctuation ranges.
Rolling average: Compares recent data to smooth short-term noise.
Exponential smoothing: Gives more weight to recent data to track short-term changes.

Smarter Techniques:

Holt-Winters: Models trend and seasonality for more accurate predictions.
Isolation Forest, Autoencoder, Prophet: Use machine learning to detect complex patterns.

5. A Practical Example

Let’s take one example, daily Facebook Ads clicks, and test it with different anomaly detection methods.
The goal: see what each method can catch, and where it falls short.

The Context:

You’re tracking daily click counts for the past 21 days:

Day	Clicks
Mon W1	4100
Tue W1	4300
Wed W1	4350
Thu W1	4200
Fri W1	4400
Sat W1	2800
Sun W1	2500
Mon W2	4150
Tue W2	4350
Wed W2	4400
Thu W2	4300
Fri W2	4450
Sat W2	2700
Sun W2	2600
Mon W3	4180
Tue W3	4320
Wed W3	3450 ❗️– unusually low
Thu W3	4290
Fri W3	4430
Sat W3	2750
Sun W3	2580

Goal:

Is Wednesday Week 3 (3450 clicks) an anomaly?

1. Overall Mean

Overall mean ≈ 3800 clicks
3450 is below the mean by ~350 clicks
But it’s still higher than Saturday or Sunday

Limitation: Doesn’t account for the fact that Wednesdays are usually high
Strength: Easiest to implement

2. Recent Mean (Last 2–3 Days)

Monday W3: 4180
Tuesday W3: 4320
Average = 4250
Wednesday W3 = 3450 → ~800 clicks lower (~19%)

Strength: Detects short-term shifts
Limitation: Ignores day-of-week patterns, so it might misread normal variations

3. Standard Deviation (Same Weekday)

Wednesdays:
W1: 4350
W2: 4400
W3: 3450
Mean = 4400
Deviation ≈ 950 (~21%)

Strength: High precision with stable patterns
Limitation: Sensitive to outliers

4. Exponential Smoothing

Predicts a smoothed value near 4380 for that day
3450 is far off → Flagged as anomaly

Strength: Smooths noise, responds to shifts
Limitation: Doesn’t understand weekday patterns

5. Holt-Winters (With Trend + Seasonality)

Learns that Wed–Fri are peak days
Predicts Wednesday clicks around 4370–4400
3450 = ~22% drop → Clear anomaly

Strength:

Recognizes normal high days
Ignores predictable lows (e.g., Sunday)
Knows when deviation is worth an alert

Limitation: Needs historical data and setup

Summary Comparison

Method	Detects Anomaly?	Aware of Day Patterns?	Accuracy
Overall Mean	No	No	Low
Recent Mean	Somewhat	No	Medium
Std Dev (Same Day)	Yes	Yes	High
Smoothing	Yes	No	High
Holt-Winters	Yes	Yes	Very High

6. Final Thought – Let Your Data Tell You When Something’s Off

You can’t write a rule for every scenario.
You can’t watch dashboards all day.

You need a smart model that understands:

What today’s data should look like
Whether deviations are real problems
When to raise the alarm

That’s why Anomaly Detection exists, to guard your data systems like a silent sentry.

🔜 Next Up…

We’ll dive into one specific model: Holt-Winters, the hero for data with clear trends and cycles.
And we’ll use real Facebook Ads data to show how it works.