Got ~99% F1 on a fraud model… turned out to be misleading

Posted by Every-Mycologist5159@reddit | Python | View on Reddit | 9 comments

I ran into something recently that changed how I think about ML evaluation.

I trained a fraud detection model and got \~99% F1.
At first, it felt like a huge win.

But after digging deeper, I realized it was caused by data leakage — the model was learning patterns it shouldn’t have had access to.

What surprised me was that nothing warned me.

No errors.
No obvious issues.
Just a “perfect” score.

Since then, I’ve been much more careful about:

It made me realize how easy it is to build a model that looks great but fails in real-world use.

💬 Curious:

How do you usually detect data leakage early?

Do you rely on: