Most AI failures are not model failures but system design failures

Posted by bix_tech@reddit | programming | View on Reddit | 4 comments

When AI products fail, people often blame the model, but in reality, it is usually the engineering around it. Missing version control for data, weak observability, and unclear ownership across teams cause more downtime than algorithmic errors.

Treating AI pipelines as software systems, with clear contracts and monitoring, drastically improves reliability.

Would like to hear how other engineers handle testing and monitoring for AI code.

[-]

programming-ModTeam@reddit

This is an ad for your webinar

[-]

s7stM@reddit

Every generation has its phenomenon, that could say the exactly same.

We have the AI/LLM: If it does not work, you do it wrong!

[-]

BlueGoliath@reddit

It fails because AI sucks.

[-]

pvatokahu@reddit

This is spot on. At BlueTalon we learned this the hard way when we were building data governance tools. The actual algorithms for access control were straightforward - the nightmare was everything around them. We had one customer whose entire data pipeline would break because someone forgot to update a config file when they switched cloud regions. Not a model problem, just basic ops stuff that nobody thought about until it was 2am and everything was on fire.

The versioning thing is huge. We had this situation where our ML team kept tweaking their recommendation engine but nobody tracked which version of the training data went with which model deployment. So when things went sideways, we'd spend hours trying to figure out if it was the new model or if someone had changed the data preprocessing scripts. Eventually we started treating every data snapshot like a code release - full version tags, changelogs, the works. Also started requiring that any model deployment had to include the exact commit hash of the data pipeline code that generated its training set.

For monitoring, we borrowed a lot from traditional software ops. Set up alerts for data drift, model performance metrics dropping below thresholds, latency spikes, all that. But the real breakthrough was when we started tracking "business metrics" alongside technical ones. Like if our fraud detection model suddenly started flagging 50% more transactions, that might technically be working fine but something upstream probably changed. Having product managers own some of these metrics alongside engineering made everyone care about the full system health, not just their piece of it.