Got ~99% F1 on a fraud model… turned out to be misleading

Posted by Every-Mycologist5159@reddit | Python | View on Reddit | 9 comments

I ran into something recently that changed how I think about ML evaluation.

I trained a fraud detection model and got \~99% F1.
At first, it felt like a huge win.

But after digging deeper, I realized it was caused by data leakage — the model was learning patterns it shouldn’t have had access to.

What surprised me was that nothing warned me.

No errors.
No obvious issues.
Just a “perfect” score.

Since then, I’ve been much more careful about:

checking for leakage
watching for “too good to be true” metrics
validating features more strictly

It made me realize how easy it is to build a model that looks great but fails in real-world use.

💬 Curious:

How do you usually detect data leakage early?

Do you rely on:

manual checks
validation strategies
or specific tools?

[-]

AutoModerator@reddit

Your submission has been automatically queued for manual review by the moderation team because it has been reported too many times.

Please wait until the moderation team reviews your post.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[-]

Every-Mycologist5159@reddit (OP)

Fair point—there’s definitely a lot of low-effort AI content around.This came from a real issue I hit with leakage in a fraud model, so I wanted to share that learning.

I’ve put more details (and code) in the comments—would be great to hear your thoughts.

[-]

The thing that helped me catch it earlier next time was forcing stricter splits, especially time-based ones. Also started doing “feature sanity checks” where I remove top predictors and see if performance drops in a runable way. If it doesn’t, something’s leaking

[-]

Every-Mycologist5159@reddit (OP)

Ended up experimenting with a one-line workflow that goes from raw CSV → model + basic trust checks (like leakage detection).

If you want to see how it’s implemented:
https://github.com/Nivedithagowda2/featuremind.git

Would love your thoughts.

[-]

j01101111sh@reddit

I just predict no in all my fraud models. It doesn't catch any fraud but it's 99% accurate. /s

But in all seriousness, this is slop.

Got ~99% F1 on a fraud model… turned out to be misleading

💬 Curious:

fakintheid@reddit

ColdDelicious1735@reddit

fakintheid@reddit

AutoModerator@reddit

KrazyKirby99999@reddit

Every-Mycologist5159@reddit (OP)

aloobhujiyaay@reddit

Every-Mycologist5159@reddit (OP)

j01101111sh@reddit