98% of companies experienced ML project failures last year, with poor data
cleansing and lackluster cost-performance the primary causes
Posted by Some-Technology4413@reddit | programming | View on Reddit | 102 comments
-grok@reddit
Management shows up with a wheelbarrow full of papers and says "LeT's uSe AI!"
Tyrannosaurus-Rekt@reddit
At my company I’m asked to gather data, train, validate, and deploy by myself. If that’s common I’d expect piss poor success rates 🤣
mccoyn@reddit
My company thinks we should write software to automatically label the training data.
baseketball@reddit
LOL, data scientists hate this one trick.
NotSoButFarOtherwise@reddit
I’m at a Fortune 500 company and our team for PoCing generative AI applications is literally me plus some lawyers telling what I can and can’t do.
foreveronloan@reddit
Most companies it's this + 20 watchers who do nothing but make meetings about it.
Tyrannosaurus-Rekt@reddit
Then you go into those meetings and they try to give you 30 actions that would put infinite distance between you and the things that actually generate $$$$$
I was in a meeting not too long ago where I interrupted the speaker and was like "This would take around 4 months for me to implement. You're not signing me up for this, right? We have our next demo across the world in two weeks"
Silence....
Ilktye@reddit
It depends what you are training the data for, and what is the scope.
Tyrannosaurus-Rekt@reddit
Definitely. There are viable one person jobs, but I think they’re often assistance (easy to be helpful) and full automation (hard to get to 100%)
JanB1@reddit
A 60-80% success rate at labelling tickets and allowing for easier triage is better than no labelling at all. But a 60% success rate at identifying what a user wants in the customer facing chat-bot or phone-bot for paying customers is more akin to a failure if the previous system was that users could determine exactly who they needed by using the time-proven method of "Please press x for y" and having a fallback for "Please press z for all other matters."
Ilktye@reddit
Yeah exactly. Most of the tickets are around same issues anyway like locked accounts after holidays.
What really made the difference is the help desk sees the estimate of accuracy from the model. If the model says "60% accuracy", the help desk can think maybe the model is just full of shit :)
JanB1@reddit
I think that should be standard to annotate the confidence level on AI-bases decisions/tasks. I think this would also help with the "Well, ChatGPT said it so it must be true?" problem. In general I think it should always be labelled if AI was involved, and to what extent.
Ilktye@reddit
100% agreed.
aradil@reddit
This is what I’m trying to train my upper level management on. Yes, you can “throw” ML at anything. A lot of it will be useless. But there are some tasks it’s really really good at, so long as you expect some false positives or true negatives.
James_Jack_Hoffmann@reddit
Mate on the company that laid me off this year, all they did was:
and charged the client 150 AUD an hour and maybe a sticker that says "AI-powered"
Hugging face models make money printer go brrr
Tyrannosaurus-Rekt@reddit
Yes. “No sense in redoing work that is already done. Build an application around their model”
Application completely shits the bed because the model was trained on pictures in commercial lighting conditions 😂
Or it was trained only on Indians so it can’t detect white people 😭
This field is stupid when management is confused
m3rcuu@reddit
Of course all in one week, and model must be spot on!
Tyrannosaurus-Rekt@reddit
Yes. Then boss tells me he needs the model needs to have ~60 more output classes for next week’s presentation “just hook into one the parameters in the model. The model should know what type of car this is”
Brother has no idea what ML is.
Noughmad@reddit
"Oh, you got 95% accuracy? Just spend two more days to make it 100%!"
GaboureySidibe@reddit
98% failure isn't something that comes from people doing too much, it's from an approach not working at all.
rmyworld@reddit
I don't remember typing this comment. Wth Why is here?
GayMakeAndModel@reddit
My boss was straight up like we’re not spending billions of dollars for shit product.
vegetablestew@reddit
Oh shit this hits too close
LessonStudio@reddit
My company has a product which uses ML to solve a fairly valuable problem. I would not at all call the ML very advanced.
It takes a layered approach where it uses more than one ML model after another to accomplish the task.
No PhDs are going to be earned from this; but it does solve the problem very very very well.
What is super annoying is the class of company which needs this solution is fairly large. Typically 5000-50000 employees. This means they almost certainly have a "data science" group, often 20+ people. All PhDs. All. Usually Math, stats, "data science", or ML if the are a recent hire.
In exactly zero cases have any of these groups produced a product which went into real time production. A few of them have a few jupyter notebooks where they take some data, screw with it, and then return a vaguely useful report. But nothing live like our product producing value in real time.
Our engagements with these companies are almost identical every time. We talk to someone in upper management. They get excited about our product. We give a few demos of it working very well.
Then they get their "data science" group involved and they want to do two things:
There is exactly a zero percent chance we will have any progress after meeting with their data science people. Often the conversations are bizzare. They ask for our models. We say, "No, that is how we make money." They ask a few different ways. Then they start dropping off the video call, and the entire thing just dies.
Where we have had more success is to just put our foot down. When they say that they want their "data science" people to talk to us, we say, "Well it was nice knowing you. Bye bye." They say, "Wait what?" and we explain, "Look, those academics are going to say two things, "What are your models?" and then after the call they are going to say we don't have the credentials to do this kind of work because we don't have PhDs.
So, we aren't interested in wasting any more time with this company.
They get mildly defensive about their ML people and we say, "We aren't interested in being shut down by a group of academics who probably haven't produces squat in the last 5 years."
They then say something like, "No, they are a huge cost center producing nothing. We are hoping you can work with them." We reply, they don't want to work with us, we are inferiors and we will also make them irrelevant.
We leave it at that, and often the engagement continues with the exeuctives making fun of how useless their "data scientists" are.
I've been putting their title in quotes because anything which puts science in its title isn't a science at all.
And this last is where academics fail hard at most practical ML. They are generally terrible programmers not good at solving problems. Problem solving is an art. The more academic knowledge you have can be a help to your problem solving skills, but only if you have any.
It seems that the people I hear of who are kicking ass and taking names at places like deepminds, etc, are both. Highly skilled problem solving programmers, and also highly knowledgeable academics.
The reality of ML is that there are lots of tools and libraries available to non academic programmers that this sort of thing is not very hard anymore. There are very few areas in the real world which require highly esoteric academic knowledge to solve the problem.
Yet, I see companies where they even snobbishly try to say there are ML engineers, and "Data Scientists" in an attempt to maintain their lofty status.
Here is an example of just how crappy the sort of PhD ML people I've dealt with are:
I gave them a one year data pull from a sensor database. The dates were in epoch seconds GMT (a standard in this particular industry), and the data was generated using a query where I used a range which resulted in the first second of the next year also being in the csv. So 31,536,001 rows of data instead of 31,536,000.
This whole team (about 8) were unable to deal with the dates, and were entirely flabbergasted by the extra row. They demanded I "fix" the dates, and that I give them the correct number of rows.
This was data for them to do R&D on, not feed into some already built system.
Think about that. 8 ML PhDs couldn't convert Unix dates or delete one row from a csv. WTF?
How are these fools going to properly clean up real world noisy sensor data which has all the wonders often found here. Dropouts, extreme outliers such as a pressure meters reading 12 million PSI, etc if they can't deal with an epoch second date format or an extra row. Also, there are subtleties with this sort of data they never asked about. Such as flow meters which get occasionally re-calibrated, which means there is both drift, and then sudden shifts in how these values will now relate to the system.
Oddly enough they never produced anything of value, other than some very significant billing.
And this is where another ML project failed like many many many that I have seen where way too many, way too "overqualified" people are given a task which is simply far beyond, not only their skill set, but often their basic problem solving aptitude.
It is far far far easier for a competent problem solving developer to learn enough ML to do very well, than for an ML academic to become a competent problem solving developer.
eraser3000@reddit
May I ask what the ml your company uses does?
LessonStudio@reddit
Without going into the super details. Some of it is boring python, bringing in boring data, cleaning it up, then running it into boring keras sequences, which may then get some more cleanup, and then run into more boring keras sequences.
The big revolution is that I stopped using tensorflow, and started using JAX.
Here are some of my miraculous discoveries:
Talk to the people who are "boots on the ground" they might have some strange ideas and needs, but they will often tell you things which are game changers when it comes to interpreting data.
Understand what is going on. That is, have a very good idea of the physics of the situation. Not to the point where you could write a simulation, but certainly the groundwork for that sort of thing. If you don't have a fairly solid gut level feel for why the data is the way it is, then the ML will produce nonsense, and you won't have a gut feeling it is just wrong.
eraser3000@reddit
That makes sense. Jax seems interesting, perhaps one day I'll venture into knowing it. As of now torch is already stressing me enough lol
LessonStudio@reddit
Use keras. It keeps things clean and simple. You can go deeper and stay within keras, or you can just use what you learn there and do it at a lower level later.
eraser3000@reddit
Ty, il remember this
Plank_With_A_Nail_In@reddit
How is it possible to know they haven't done it? Why would they tell you?
LessonStudio@reddit
Because their bosses told me they have produced nothing.
This is the same story over and over.
Here is a quote from one executive:
"Why is your BS any better than the BS I've been getting for the last 5 years from our own team. The only thing I've learned about AI is that they always need better data."
ammonium_bot@reddit
Hi, did you mean to say "more than"?
Explanation: If you didn't mean 'more than' you might have forgotten a comma.
Sorry if I made a mistake! Please let me know if I did. Have a great day!
Statistics
^^I'm ^^a ^^bot ^^that ^^corrects ^^grammar/spelling ^^mistakes. ^^PM ^^me ^^if ^^I'm ^^wrong ^^or ^^if ^^you ^^have ^^any ^^suggestions.
^^Github
^^Reply ^^STOP ^^to ^^this ^^comment ^^to ^^stop ^^receiving ^^corrections.
rmyworld@reddit
Are there any resources you can recommend to "non-academic programmers", so that they can learn to build things that are actually useful with ML?
I've been trying to get into the field, but it seems difficult to achieve without having to go through all the "academic" side of things.
LessonStudio@reddit
Learning and doing fully viable and practical ML is quite easy. The tools are getting very mature, and the machines very powerful.
My recommendation is to find a problem which interests you; but one where you can get data. Then, attack it. Just keep googling how to do X. This will then result in a bit of a mess but you will get your hands dirty and now understand what you don't know.
Now look at various online courses such as things on linkedin and youtube. There are piles. But, you will now be able to filter out the BS from the good stuff. Most of it is BS which starts blah blahing about types of ML such as classification, etc. That is just crap good for passing a test ML 101; you will learn most of that in 10 seconds when you get your hands dirty.
A good course will cover good visualizations, various modern methods to solve different problems. The reality is that quite a few problems are easily solved with something as basic as a linear regression or a random forest. Visual is pretty much a whole field on its own, as is speech.
But, and this is where the "academic" will punch you in the balls. If you want a job at a big company with the people I am complaining about you will hit a wall of gatekeepers. If you don't have a graduate degree, forget about it. Even the, many of them have questions like, "How many papers have you published, etc." They will also put you through grinding interviews which are graduate level math exams. What they won't ask you is to show them some cool problem you have solved well; they won't because you might ask them the same question and the answer is probably just going to be jargon for: "None".
Where someone without a graduate degree in this will do just fine is working for a normal software development company where ML could be applied to solve useful problems.
Maybe you sell farm supplies, and want to make a recommender for other cool products on your website. This is super easy and other than stumping 20 PhDs is something you could poop out in under a week. Or you are looking to mine data from that same farm supply company database as to which is the best list of customers for different marketing campaigns. With some stats 101 and some simple ML, this is not a hard problem to solve.
eadgar@reddit
Had this experience. People working more than a year on something, producing maybe one report with some graphs, not being able to create anything that could be used in production on a regular basis. But clients paid for it so all good?
LessonStudio@reddit
I dealt with one guy in a large company who said, "I've been carving notches in my office desk for every failed AI project since the 90s. My desk has a two foot hole in the center."
OCD_DCO_OCD@reddit
I had no clue even big companies had shit database structures and generally bad data. Everyone went on expensive courses on “big data” and the like the last 10 years and every time I try and pop the hood it’s a clusterfudge. Was it just PR sending people to those courses?
schmuelio@reddit
In my experience most (larger/non-startup) companies have shit data organization because of 3 reasons.
The first is a handful of people at the company "just want to get their work done", which generally means they're either pressed for time or focused only on the end result of their work. This type of practice leads to people not bothering to follow established processes, so a lot of stuff gets done ad-hoc (which leads to inconsistencies).
The second is that a lot of people, rather than asking around and finding out the correct process (and then following it) will choose an existing result and do the same thing. So someone else sees this ad-hoc work, assumes that since it was accepted it is probably good enough, and does their work to the same standard.
The third reason is legacy, if you're usually pressed for time (or your company is big enough) then accepting things that are currently working is easier than redoing it to make it properly organized or aligned.
The end result is - in general - it's easier to do disorganized work than it is to clean up disorganized work. Couple this with managements general lack of interest in things being done in an organized way (and add several years) and you end up with a big messy pile of "stuff", where people who have worked at the company for a long time know where to find stuff, new people duplicate stuff (see reason 1), and nobody knows why things are so disorganized.
Execute_Gaming@reddit
Clean and large scale data collection is one of the biggest challenges in the field. It's partially why models trained on synthetic data generated from computers have done well in the last few years (see DepthAnything2 and Microsoft's Metahuman based Face detection). OpenAI allegedly also has ChatGPT self-regulate/train itself to ensure safety.
metahivemind@reddit
This reminds me of Amway's scam. Every success is because of OpenAI, every failure is coz you didn't do it properly.
SoniSins@reddit
it's time we burst the ai bubble
Intelligent_Volume74@reddit
Não é por acaso que tá tendo um boom de vagas em engenharia de dados. As empresas entenderam que não dá pra fazer ciência de dados com dados ruins, acredito que vamos ter alguns anos aí de maturação de engenharia até voltar o hype da ciência de dados, mas até lá , acredito que vamos evoluir ao ponto do analista de dados conseguir fazer mais coisas de ciência de dados (é só ver o andamento do sagemaker do aws e do bigqueryML do Google Cloud)
Kinglink@reddit
AGAIN.. please consider the SOURCE. of this study.
Yeah, this is just bullshit propeganda.
josluivivgar@reddit
I think we all see the writing on the wall while working at our companies and seeing this ai stupid craze.
AI has always been useful for multiple things, but a lot of the companies that are into AI right now are probably gonna fail in using Ai because all they're doing is tack a glorified chat box into their app at a pretty high cost for almost no benefit
it's not profitable to add AI to everything.
they're solving a problem that's not there.
then there's the companies that are like omg it's happening In like 1 year I can fire everyone and let AI earn me money, and that's also unlikely to happen.
companies that already used Ai or that are tackling a real problem and leveraging AI are the companies that will see success and profit from these past breakthroughs...
and it's still a costly business that can be risky because of the initial requirements, so even companies doing the right thing might run out of money before they can successfully leverage AI to solve whatever they were tackling.
PoolNoodleSamurai@reddit
Oh, if it has a “GPU- patented SQL engine” [sic] then it must be special. “GPU- “s don’t just patent any old boring SQL engines.
Recoil42@reddit
It is, in general, a pretty unremarkable claim even if you ignore the source.
New technology appears, companies stumble as they work to adopt it for the very first time?
Who woulda thunk it?
Barbanks@reddit
Heck, people STILL underestimate how much it costs to build an app and how complex it can actually be. Now people want to all get into ML, A.I.? Gtfo.
I remember during the NFT craze someone told me to put them into my workout app…..again, my workout app…a lot of “visionaries” and marketers just love to use buzzwords without understanding the ramifications of what they’re asking.
Kinglink@reddit
This is also absolutely true... People don't understand the size and scope of the business they are in.
If I told you 80 percent of new restarants failed during COVID, I'm sure you would think "Well why didn't the government help him out." But the real fact is that 80 percent of new restaurants fail in the first two years. And have for a long time.
I'm pretty sure it's 90% of games don't make back their investment, probably even higher. Sounds terrible, but that's including every indie game where most don't do that well. (as well as mobile games, even from big studios, a lot of them throw shit at the wall and while one might be successful and keep getting developed for years, the other 4-5 they make to see what gets them a would be counted as a failure.
Almost all consumable media, and almost every startup has low hit rates, but shrug people don't really pay attention to what the statistics really say.
Additional-Bee1379@reddit
r/programming upvote flowchart:
Would I like it if this news was true?
Instead of:
Is this a well analyzed article?
Kinglink@reddit
That's true of most of Reddit (And the internet) however in this case it's really obvious that there's something wrong here. (98 percent of project probably didn't finish last year) and I dug deeper on it last time I saw it.
NormalUserThirty@reddit
next year we'll get that number up to 99%
Malmortulo@reddit
"improved tracked metrics by 50% of remaining SLA"
HolyPommeDeTerre@reddit
For 2026, AI is expected to go above and beyond with 102%!
EnoughWarning666@reddit
Well, you see, this AI model goes to 102.
Does that mean it’s more accurate?
Well, it’s two more, innit? It’s not 100. You see, most chatbots, you know, will be at 100 accuracy. You’re on 100 here, all the way up, all the way up, all the way up, you’re at 100. Where can you go from there? Where?
I don’t know.
Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?
Go to 102?
102. Exactly. Two more accurate.
ferlonsaeid@reddit
With a 2 percent margin of error!
WriteCodeBroh@reddit
And then we can spin up new classes to teach some new ML project managed framework! Let’s call it… Nimble! And we’ll all get together and talk about the 99% failure rate and all the great, unscientific approaches we have to fixing it but somehow that failure rate will never get better!
pysk00l@reddit
next year? There are still 2 months in the year!! Work harder not smarter!!
Conscious-Ball8373@reddit
We'll aim for at least five nines, thank you very much.
yetanotherx@reddit
I assume this refers to 98% of companies that had ML projects?
AndrewNeo@reddit
Doesn't sound like it. It's entirely possible that 6 of the 300 respondents just didn't use ML at all and the rest all had problems..
prehensilemullet@reddit
“Global Surveyz Research”? If not a typo that’s gold
CherimoyaChump@reddit
The Z stands for Zebra.
Simon_Drake@reddit
Or they didn't want to admit that they had problems so just lied and claimed they hadn't tried any ML projects.
joey_nottawa@reddit
My thinking is that companies with a FinOps role are already a pretty narrow selection.
manystripes@reddit
Yeah the headline just blanket says "98% of companies" as a super broad category, so it has to be some super narrow selection in the first place since I doubt that percentage of companies as a general category are even doing ML projects.
Conscious-Ball8373@reddit
Probably also self-selecting to some degree, though.
IndividualLimitBlue@reddit
It worked perfectly on our side : brilliant ML engineering team and management(CEO) with a ML tech background who trusted the team. We are the 2%.
Kinglink@reddit
No you're part of the 100 percent. They categorize ANY issue as a "Failure" if you ever did a run with low quality data, or low quality results... they'd call that a "Failure".
Their "report" is bullshit.
That being said, Kudos, ML is going to be here for quite a while (if it ever goes away) and if you had a success with it, that's a good sign.
Glad to hear your management also knows his place (out of the way)
IndividualLimitBlue@reddit
I must admit I didn’t get the first part of your message.
Kinglink@reddit
Basically saying "You're just part of everyone" They tried to make it sound like only 2 percent are successful but their analysis is extremely low quality (they asked "What issues did you have" but then claimed they are all "Failures"
IndividualLimitBlue@reddit
Oook I get it now 😁
Kinglink@reddit
Don't feel bad, I definitely wrote it like crap... :)
throwaway490215@reddit
Thats great and all, but without any further reveal of what you actually did this is useless.
Its not that you're asking us to trust a random guy on the internet - we do that all the time. Its that you're asking us to trust a random guy on the internet knows what he's talking about - without talking about it.
For all we know you're bragging about opening a OpenAI account.
IndividualLimitBlue@reddit
I won’t tell you shit but this thing : not a single call to third parties LLM like OpenAI. Pure internal training and everything (cybersecurity)
PuffaloPhil@reddit
To the average reader in this subreddit there is nothing but generative AI and nothing of value has ever been created in the broader field of machine learning… OCR doesn’t exist, face recognition doesn’t exist, etc.
mailed@reddit
very curious as I'm also part of a cyber team that are doing analytics but nothing beyond that. was it all focused on detections?
IndividualLimitBlue@reddit
Yes, detection
uatec@reddit
How about poorly defined business outcomes. Where the goal was apparently to have an ML project on the books, rather than to have any output.
wavefunctionp@reddit
I’m not on the AI hype train, but I expect most projects with similar levels of novelty to mostly fail.
pyeri@reddit
All problems will be solved if we stick to this basic rule that LLMs are useful for only grunt work, not sophisticated works requiring things like human insights, practical experience and craftsmanship?
These are some of the tasks which I often use chatgpt for, notice that all of them can be categorized as "grant work". The moment you step into "creative and insightful work" territory like writing the actual article or building and compiling the actual app, it will start to feel overwhelming!
I don't know what use ML had in these companies but if it's classic build or devops work, it's probably more than just grunt work?
bwainfweeze@reddit
How many businesses do you know who figure out what work this is except by the hard way?
How many forget it during the first round of layoffs? Or the second?
idebugthusiexist@reddit
I’m having an aneurism contemplating this
bwainfweeze@reddit
SneakyDeaky123@reddit
Turns out when you scream and rush people to do something that must be done carefully and well, and make them work with poor quality data, it doesn’t turn out well
bwainfweeze@reddit
I don’t know how anyone who has been through a product lifecycle including the requirements gathering phase more than once can still be an optimist about companies getting data right without constant haranguing by the development team.
Nobody knows what they want and are never happy getting what they asked for.
zerothehero0@reddit
A tale as old as the Sun, garbage in garbage out.
B1WR2@reddit
I think it’s shitty business strategies and initiatives.
Brojess@reddit
Data engineering is the bedrock for ML and Statistics but it is often ignored and the data scientists who aren’t trained in proper warehousing and storage are forced to do it.
LloydAtkinson@reddit
Is there a writeup of this somewhere? You know what it's like trying to make execs and corporate types read anything, let alone a PDF.
throwaway490215@reddit
The other 1.999% either had an engineer smart enough to sell their SQL improvements as AI, or they haven't gotten the memo yet that the "We're doing AI" hype is coming down and you don't need to pretend to be good at it anymore.
Additional-Bee1379@reddit
Are you trying to say there aren't any successful ML applications?
ShadeCormac@reddit
..rd aw qaaf
SenatorStack@reddit
I wonder how many of those projects did not have solid data engineering practices in place.
TastiSqueeze@reddit
AI processing almost always leads to use of large volumes of data being scrubbed with very low efficiency. A finely tailored solution extracting small amounts of highly relevant data is almost always a better overall solution. It takes a highly skilled and knowledgeable person to implement this kind of solution..... Which is why most current efforts are stuck with large data volume low efficiency high cost solutions.
gumol@reddit
if your company doesn't have failing projects, it means it's not pushing hard enough
Stimunaut@reddit
Was this line of self-sustaining logic written by a manager?
gumol@reddit
Not really. Projects failing is a normal thing. Not everything has to lead to revenue and profit. Some things can be explored and left to die on the vine.
GAMEchief@reddit
98% of projects shouldn't 🤪
gumol@reddit
98% of companies experiencing failures doesn't mean 98% of projects failed
If you have 10 different projects, 9 succeeded, 1 failed, you're part of this statistic.
Scavenger53@reddit
its not "explored" its pushed on devs who dont have the resources to do it correctly then it blows up in the companies face. they arent building the team correctly, or gathering the necessary data, or handling it properly, they are winging it.
lukemcr@reddit
👀 I expect a lot of "no comment" comments here, from myself included haha
znihilist@reddit
Here is a short description of my time at "data driven" tech companies.
Please add this feature, I've added it, it doesn't improve the model. Well we want it there.
This modeling approach doesn't work we can't predict the thing you want from the data you want me to use. The answer from stakeholders, have you did this test? Yes, okay what about this other test, yes, alright but what about this other other test? It is not relevant to us.
We want the model to be interpretable (tried to explain to them that when pressed on specifics they wanted simple, but no, they know the word "interpretable "). Model ended up needing something complex but interpretable , project get shelved as it is not "interpretable".