Ben Writing

A Sunny Day, Broken Data, and You Had No Idea

1. A Regular Morning at startup

At 8 AM, you open the tracking dashboard like always.
All ETL jobs from the previous day show success, at least according to the latest report.
Then Slack starts buzzing.

The PM sends alerts:

“Hey, why is all the TikTok data missing from yesterday’s report?”
“And what’s with Tuesday’s campaign revenue showing as zero?”

You open the ETL logs, check Snowflake, dig through S3, and spot something odd.

It seems the pipeline ran successfully, but the data was incomplete, and no one noticed.
Maybe the data had been wrong for days, and the system didn’t say a word.

And so the story begins…


2. What’s “Normal”? What’s Not?

In data systems, an anomaly is any value that breaks from the usual flow.

  • Point anomaly: Revenue = 0 on a day that normally hits hundreds of millions.

  • Contextual anomaly: Clicks on Friday drop below Wednesday’s, which reverses the typical trend.

  • Collective anomaly: Traffic drops 20% for five straight days, but no one notices because each day looks “fine” in isolation.

The key question: How do you know when a data point is abnormal, if no one tells you?


3. Bad Data Isn’t the Scary Part, Not Knowing It’s Bad Is

As your system scales:

  • from a few tables to hundreds,

  • from one pipeline to dozens,

  • from a single data source to many,

… you can no longer rely on visual checks or user complaints.

You need a digital sixth sense, something that constantly watches your data and knows when it strays from expectations.


4. How Can You Detect Anomalies?

Manual Rules:

  • “If record count < 1,000, send an alert.”

  • “If today’s revenue deviates by more than 50% from yesterday’s, send a Slack message.”

But seasonality exists, traffic drops every Sunday, orders spike at month-end.
Static rules easily misfire or miss the issue altogether.


Statistical Methods:

  • Z-score: Measures how far a point is from the mean.

  • Standard deviation: Flags values outside normal fluctuation ranges.

  • Rolling average: Compares recent data to smooth short-term noise.

  • Exponential smoothing: Gives more weight to recent data to track short-term changes.


Smarter Techniques:

  • Holt-Winters: Models trend and seasonality for more accurate predictions.

  • Isolation Forest, Autoencoder, Prophet: Use machine learning to detect complex patterns.


5. A Practical Example

Let’s take one example, daily Facebook Ads clicks, and test it with different anomaly detection methods.
The goal: see what each method can catch, and where it falls short.

The Context:

You’re tracking daily click counts for the past 21 days:

DayClicks
Mon W14100
Tue W14300
Wed W14350
Thu W14200
Fri W14400
Sat W12800
Sun W12500
Mon W24150
Tue W24350
Wed W24400
Thu W24300
Fri W24450
Sat W22700
Sun W22600
Mon W34180
Tue W34320
Wed W33450 ❗️– unusually low
Thu W34290
Fri W34430
Sat W32750
Sun W32580

Goal:

Is Wednesday Week 3 (3450 clicks) an anomaly?


1. Overall Mean

  • Overall mean ≈ 3800 clicks

  • 3450 is below the mean by ~350 clicks

  • But it’s still higher than Saturday or Sunday

Limitation: Doesn’t account for the fact that Wednesdays are usually high
Strength: Easiest to implement


2. Recent Mean (Last 2–3 Days)

  • Monday W3: 4180

  • Tuesday W3: 4320

  • Average = 4250

  • Wednesday W3 = 3450 → ~800 clicks lower (~19%)

Strength: Detects short-term shifts
Limitation: Ignores day-of-week patterns, so it might misread normal variations


3. Standard Deviation (Same Weekday)

  • Wednesdays:

  • W1: 4350

  • W2: 4400

  • W3: 3450

  • Mean = 4400

  • Deviation ≈ 950 (~21%)

Strength: High precision with stable patterns
Limitation: Sensitive to outliers


4. Exponential Smoothing

  • Predicts a smoothed value near 4380 for that day

  • 3450 is far off → Flagged as anomaly

Strength: Smooths noise, responds to shifts
Limitation: Doesn’t understand weekday patterns


5. Holt-Winters (With Trend + Seasonality)

  • Learns that Wed–Fri are peak days

  • Predicts Wednesday clicks around 4370–4400

  • 3450 = ~22% drop → Clear anomaly

Strength:

  • Recognizes normal high days

  • Ignores predictable lows (e.g., Sunday)

  • Knows when deviation is worth an alert

Limitation: Needs historical data and setup


Summary Comparison

MethodDetects Anomaly?Aware of Day Patterns?Accuracy
Overall MeanNoNoLow
Recent MeanSomewhatNoMedium
Std Dev (Same Day)YesYesHigh
SmoothingYesNoHigh
Holt-WintersYesYesVery High

6. Final Thought – Let Your Data Tell You When Something’s Off

You can’t write a rule for every scenario.
You can’t watch dashboards all day.

You need a smart model that understands:

  • What today’s data should look like

  • Whether deviations are real problems

  • When to raise the alarm

That’s why Anomaly Detection exists, to guard your data systems like a silent sentry.


🔜 Next Up…

We’ll dive into one specific model: Holt-Winters, the hero for data with clear trends and cycles.
And we’ll use real Facebook Ads data to show how it works.