Study: Experienced devs think they are 24% faster with AI, but they're actually ~20% slower
Posted by femio@reddit | ExperiencedDevs | View on Reddit | 318 comments
Link: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
Some relevant quotes:
We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation [1].
Core Result
When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.
In about 30 minutes the most upvoted comment about this will probably be "of course, AI suck bad, LLMs are dumb dumb" but as someone very bullish on LLMs, I think it raises some interesting considerations. The study implies that improved LLM capabilities will make up the gap, but I don't think an LLM that performs better on raw benchmarks fixes the inherent inefficiencies of writing and rewriting prompts, managing context, reviewing code that you didn't write, creating rules, etc.
Imagine if you had to spend half a day writing a config file before your linter worked properly. Sounds absurd, yet that's the standard workflow for using LLMs. Feels like no one has figured out how to best use them for creating software, because I don't think the answer is mass code generation.
dsm4ck@reddit
Experienced devs know it's easier to just say what the bosses want to hear in surveys
femio@reddit (OP)
The estimations were from open source devs, not from devs in corporate environments under managerial pressure.
I think the difference comes more from prompting requiring less cognitive load than writing the code yourself. So it feels faster only because it feels easier.
dapalagi@reddit
This is the part that makes me think I am often more productive with the AI. having to code everything myself might be faster in the short term, but day in day out it’s more taxing. my brain has limits and is susceptible to burnout, tiredness, etc. If I were to move just as fast or even slower with the AI help then I’ll still take the AI help. To me AI, isn’t necessarily great for companies or code bases. But it really helps on days when I’m not at peak performance, tired, or just plain don’t give a shit. Overcoming inertia is hard and the AI is always ready to bang out something (even if it’s shit the first go around).
Dany0@reddit
In the mind, memory is made up of events and time is only estimated. Unless devs make actual observations and note down the time they spend doing stuff, of course they'll be off
Honestly I wish it at least felt faster. There would at least be some upside. 20% slower for much less risk of burnout. It would certainly help managing ADHD symptoms long term. But no, in practice, it's just more work for less results. Wake me up when the AIs can make decisions
lasooch@reddit
I tried Claude Code recently on a super tiny personal project. I was actually surprised how well it did (I didn't have to correct literally anything - but I did ask it to basically replicate the same structure I have, just for a new db table, with well defined columns in the prompt, so it's not like it was a particularly complex task).
But I noticed that the waiting for the code to generate actually fucks with my ADHD. It's in that spot of "too long to just watch the command prompt, so I'll switch away for a second" and boom, distracted.
Had I written that same bit of code myself, while it would have taken longer, I probably would have done it in one go without ever switching away from nvim. I might get more adjusted to using it with more practice, but I think that for many tasks it actually makes my ADHD harder to deal with. And I suspect for bigger tasks it feels so much more like forcing myself to do another code review rather than writing code, and I enjoy the latter more.
Dany0@reddit
Damn brother, thank you for writing this out. I missed this even when I thought deeply, I mean fuck I even meditated on this and completely missed something which was staring into my face the whole time
Waiting for LLMs drains ADHDers limited willpower. It's also why I was so excited initially, when I was waiting and didn't know what it would spit out it pulled me down a dopamine spiral. It's also why I love playing with LLMs on random stuff, exploring sciences where LLMs are a strong point like linguistics, reverse engineering or history. When I don't know the result - my brain actually loves it
But by now, I have an idea of what the LLM will spit out and I dread the idea of having to fix it for the LLM and it's taking energy away instead of giving it to me
LastAccountPlease@reddit
And whatever you write you write once, you can't make a direct comparison.
edgmnt_net@reddit
Open source tends to be more strict about quality and long-term maintainability, though. The main market for AI tools seems more like custom apps and feature factories.
ewankenobi@reddit
A massive flaw in the study for me was the fact they weren't solving the same issues. Could it just be the issues the AI developers were assigned turned out to be harder than expected. Not sure how you would quantify it correctly though.
gizamo@reddit
Tbf, many experienced devs also know to lie to their bosses for self preservation. Not saying that's relevant with this particular study, but it's certainly relevant to many AI discussions I've seen.
Pleasant-Memory-1789@reddit
Exactly. I rarely even use AI. But whenever I finish a feature earlier than expected, I always give credit to "using AI".
It sounds backwards. Why would I give credit to AI? Doesn't that make me look replaceable? It's actually the opposite:
It makes management think you're extremely AI competent. When cost cuts come around, they'll keep you around for your AI competence.
It sells the dream of replacing all the devs with AI. Even though it'll never actually happen, management loves the fantasize. Imagine those huge costs saving and massive bonuses.
It makes you look less like a try-hard and more like a wizard. So your peers envy you less and admire you more.
HideousSerene@reddit
I have not just one but several coworkers like you.
My favorite part is how some of them recently devised a "framework" for building with AI which was literally just using cursor and feeding in figma prototypes and jira tickets with mcp.
Now they're "rolling out the framework" to all engineers and fully expecting everybody to increase speed 20%.
You can literally see in our cursor account approximately 100% adoption already.
This is just shitty people trying to capitalize on shitty times. And hey, it's working for them.
Maybe you should apply to work at my company. You've got management material written all over you.
Pleasant-Memory-1789@reddit
Yep. Gotta randomly post your cool promptz in the team Slack channel to show off how amazing you are at generating AI slop.
praetor-@reddit
Is this the 49th law of power?
neilk@reddit
I’m not sure if you are just trolling but from what I’ve seen this would actually work in many companies
Pleasant-Memory-1789@reddit
Thank you, I am trolling lol. I would not do this but I swear it feels like my co-workers are spewing this bullshit.
MediocreDot3@reddit
The other day I had a meeting where 3 of us were just hammering away at chatgpt for a bug and I felt like a caveman
Historical_Emu_3032@reddit
I would have quit immediately.
3 devs in a room needing chatgpt to solve a bug, instead of peer programming is not a place I would want to work.
That's f'ing stupid.
Which-World-6533@reddit
I've found the people who use ChatGPT more tend to be poorer coders. It's a huge crutch.
nutrecht@reddit
Absolutely. And that's the biggest danger. These tools are nice tools for some boilerplate-y stuff. But poorer devs are going to use it as a crutch and generate a ton of useless crap.
I already see the worst devs in our group be the biggest fans, and for example generating all their unit tests from the code that is also spit out by the LLM (Copilot in our case).
deathhead_68@reddit
I think you've nailed it really. AI is actually incredible but the shit devs among us really use it in the worst possible ways. They treat it like a programming oracle, and don't have the ability to know its outputting crap.
Pleasant-Memory-1789@reddit
This also happened to me. It was honestly disturbing.
One hour on trying to reproduce, 2 minutes spewing Claude slop, then giving up and spending the last hour figuring out how we can convince product to not care about the bug anymore.
nullvoxpopuli@reddit
did you ever get reproduction steps?
my process is always:
1. human reproduction steps
2. codify the reproduction steps, and see how minimal we can make it -- some debugging (debugger, breakpoints, etc) could happen here to help influence problem reducing
3. then debug for the fix, test should pass now
Pleasant-Memory-1789@reddit
nope, Claude couldn't figure it out and our brains don't work that well anymore.
paperic@reddit
"our brains don't work that well anymore. "
What?
What do you mean "anymore"?
AdmiralAdama99@reddit
Not OP, but I imagine he means: In the post AI era, where devs ask AI instead of keeping perishable debugging and programming skills up to date.
Sorrus@reddit
This is a terrifying response lmao
oldDotredditisbetter@reddit
the fact that they even thought they're faster with AI just shows that they aren't as experienced as they thought
Ok_Passage_4185@reddit
I think it rather demonstrates that time flies when you're working on new shit, and drags when you're working on old shit.
They felt like they were getting things done because they were learning about the LLM. That's just how the brain works. It takes true analysis to identify how little value that interesting work is bringing to the table.
NuclearVII@reddit
Look, I hate these junk "tools" as much as the next guy who his head on, but this paper studied 16 developers - not what you'd call a serious sample.
Now, ofc if the 10x engineer claims were realistic, that'd be obvious even with a sample size this small, but no one sensible is defending that anymore.
another_account_327@reddit
16 developers who were very familiar with the code base. IMO AI is most useful when you're getting started something you're not familiar with.
Ok_Passage_4185@reddit
"AI is most useful when you're getting started with something you're not familiar with."
I keep hearing this type of thing, but when I tried to get one to initialize an Android project directory, I couldn't get it to accomplish the task in an hour of trying:
https://youtu.be/U05JrrtVBuk
russels-parachute@reddit
Devs spending more time automating a task that feels tedious to them than the automation could ever save them, then feeling they were more productive that way? Not sure we can blame that one on AI.
PoopsCodeAllTheTime@reddit
But LLM is far from "automation", the same way that people don't refer to autocompletion as automation.
tooparannoyed@reddit
I offload tasks that I know AI will be able to do with a low likelihood of error or hallucination. I don’t care if it takes a little longer (but I don’t think it does), because it reduces cognitive load and allows me to apply that extra to something AI can’t do without making a mess.
Throughout my day, I always have a couple short sessions with AI that almost feels like a break. No need to look up syntax, specs, etc. Just chilling, prompting, letting AI do its thing and reviewing its output. Then it’s back to the real work, which would definitely take longer if I tried to teach a hallucination machine all the complicated pieces, edge cases and how to deal with creative user input.
inhalingsounds@reddit
EXACTLY.
People are measuring fast and slow and forgetting to measure how much brainpower we save on tedious stuff with proper use of AI.
PoopsCodeAllTheTime@reddit
I mean, that's just a convenience in the end, the same way dark mode might feel like less mental strain to some, but not to others. I'm perfectly fine with this perspective, but the zealots hate to hear it.
TimmyAndStuff@reddit
See the thing is I try to use AI like this. But I feel like when I break tasks down to a size that's actually manageable for the AI, they end up being so small that it's taking me longer to write the prompt and to review all the code than it would've taken me to just write it myself.
So to me having the AI write something ends up being more stressful and more cognitive load. I wish I could have these chill AI breaks you mention, but prompt engineering in a way that will actually produce results is the most annoying thing in the world for me. I feel like I have to be hyper specific and precise and use all these little tricks to the point where I'd rather write the code myself. Not to mention how many times I end up having to rewrite it myself anyway.
I really am trying to give AI a fair shot here. I have coworkers who do seem to be really successful with it. Honestly I just think it's a skill that is not a 1:1 match with programming, so for some devs it's great, but for people like me it's just not worth it. Tbh the only thing I really end up doing with it is leaving it on when I'm on lunch break or leaving it running on something in the evening and having something to look at when I get back.
MoreRopePlease@reddit
I use the AI to generate basic svgs for me, create short scripts, rewrite old lodash and jQuery stuff into modern JavaScript, explain syntax and specs to me, speculate on the causes of error messages. All of this increases my productivity and lets me focus on what I'm trying to do instead of chasing rabbit trails.
I don't have it create large chunks of code or unit tests. That's pretty useless ime. I think it's just another tool. Use it where it's useful, but experiment to figure out where it's useful.
Ddog78@reddit
Finally! Someone who uses AI like I do. They're fun sessions - I'm taking a break when I'm using AI.
sebzilla@reddit
Same here! I actually do a thing I've dubbed the "AI sandwich"..
When I'm starting a new feature or task, I'll prompt some initial ideas and approaches, maybe have a 5-10 min chat with the AI..
Then I'll get to work, and there I write the code myself but I do use Copilot's autocomplete to semi-scaffold stuff and move a bit faster (I think?) while still being in charge of the code structure and implementation strategy.. This is where I spend the bulk of my time.
Then I will sometimes use Copilot Agent Mode or Cline to do the more routine stuff like write tests..
At the end, I use Agent mode to basically ask for a code review, looking for bugs, performance optimization improvements or other critiques. I would estimate that I take at least one suggestion every time (or something in the review inspires me to improve something somewhere).
This approach feels like a best of both worlds, I can start with what is effectively custom documentation for whatever I'm trying to build, and then I do the work myself with some smart AI-powered efficiencies so I'm in control, I know what's being written and that it does what it should, and then at the end i get a quick code review to help me do a polish pass.
Cazzah@reddit
Oh that's a good way of putting it.
It's absolutely easier to review work you just asked for than to write code from scratch. The cognitive load is absolutely a thing.
femio@reddit (OP)
Yeah, that's what's really fascinating to me. We can't even self-report our productivity gains reliably. Makes me feel like there's a realistic scenario where 2 more years and billions of dollars in LLM investment fails to beget AGI and there's a massive bubble burst.
Crack-4-Dayz@reddit
I have yet to hear anyone even attempt to sketch out a plausible mechanism for LLMs leading to anything that could be credibly labeled as "AGI" -- it's always just extrapolation of model improvements thus far, usually coupled with assumptions of exponential improvement over the long run.
In other words, I take the "fails to beget AGI" part of your realistic scenario to be the null hypothesis. However, I don't assume that such a failure will prevent corporate software development (at least in the US) from being widely transformed to be heavily reliant on "agentic" architectures that would make Rube Goldberg shit himself.
ToddMccATL@reddit
It's already happening from my experience and conversations, and the bigger the company, the more likely they are headed down that road at high speed.
HelveticaNeueLight@reddit
I was talking to an executive at my company recently who is very big on AI. One thing he would not stop harping on was that he thought in the future we’d use agents to design CI/CD processes instead of designing them ourselves. When I tried to ask him what he thinks an “agentic build process” would look like, it was clear he was clueless and just wanted to repeat buzzwords.
I think your Rube Goldberg analogy is spot on. I can’t even imagine what wild errors would be made by an agentic build pipeline with access to production deploy environments, private credentials, etc.
loptr@reddit
The guardrails today are very immature because the scenarios are new, and the security concerns/risks are very real.
But that will pass with time as people find the optimal ways to validate or limit actions/add oversight and LLM security matures in general. (A very similar thing is currently playing out in the MCP field.)
But "design" is also a very broad term (maybe that wasn't what they said verbatim or maybe their specific intention was clear already), it could simply mean to create the environments and scaffold the necessary iac (terraform/helm charts) according to the requirements/SLA for the tier etc.
For example a company can still build their own Terraform modules and providers (or some other prefabs), and have them as a selection for the LLM to choose from, and based on if it's a product built in expressjs or Go, pick the appropriate runtimes and deployment zones based on the best practices documentation. I.e. "designing" it for each product based on the company infrastructure and policies.
A second interpretation would be to use to identify bottlenecks and redesign pipelines to be more optimal, but that's more one time/spot work.
Either way it's not something that can necessarily be setup successfully today, but I don't think it's unfathomable to see it in the future.
maximumdownvote@reddit
I'm confused. Why -7 for this post? I don't agree with it all but it's a legit post
loptr@reddit
I think the simple answer is that it's too LLM/AI positive and triggers people's resentment for the general AI hype. But appreciate the acknowledgement.
Krom2040@reddit
I haven’t heard about who is a serious technical contributor attempt to sketch out such a thing. I’ve heard many people gesticulate wildly about it who are making a bunch of money selling AI tools.
sionescu@reddit
In hindsight, it's not surprising at all: the developers who use AI and enjoy it, find it engaging which leads them to underestimate the waste of time and overestimate the benefits.
Adept_Carpet@reddit
Even if you don't like it, for a lot of devs having an AI get you 70% of the way there with an easy to use, conversational interface and then you clean it up and provide the other 30% with focused work. That might take a lot less energy even if it turns out to take as much or more time.
the-code-father@reddit
Part of this though is the inherent lag involved with using all of these tools. There’s no doubt it can write way faster than me, but when it hangs out the request retries or it gets stuck in a loop of circular logic it wastes a significant amount of time
Goducks91@reddit
I think as we leverage LLM as tools, we'll also get way more experienced on figuring what is a good task for an LLM to tackle vs what isn't.
NoobChumpsky@reddit
Yeah I think this is the key. There is a real divide in what execs think LLMs are capable (you can replace a whole dev team with one person and the LLM figures it out!) vs. the reality right now (I'm maybe 15% more effective because I can offload rote tasks). I know what those rote tasks are after a bit of experience and I get how to guide the LLM I'm using.
But the idea of AGI right now feels like a fantasy, but there is billions of dollars on the line here.
sionescu@reddit
This is precisely what's not happening: dues to instability of LLM's they can't even replicate previous good output with the same prompt.
MjolnirMark4@reddit
I can definitely confirm that one.
I used an LLM to help me generate a somewhat complex SQL query. It took around 500ms to parse the data and return the results.
A few days later, I had it generate another query with the same goal as before. That one took 5-6 seconds to run when processing the same data as the first query.
Goducks91@reddit
Hmmm that hasn’t really been my experience.
maccodemonkey@reddit
LLMs are - by design - non-deterministic. That means it's built in they won't give the same output twice.
How bad the shift between outputs can be varies.
edgmnt_net@reddit
It's not just that, it's also building a model of the problem in your head and exploring the design space, which AI at least partly throws out the window. I would agree that typing out is tedious, but often it just isn't that time consuming especially considering stuff like open source projects which have an altogether different focus than quantity and (IME) tend to focus on "denser" code in some ways.
mcglothlin@reddit
I'm gonna guess a big part of it is that devs (including myself) are pretty bad at estimating how long something is going to take. 20% either direction is probably within typical error, any individual engineer couldn't report this accurately, and you could only show it with a controlled trial. So you do a task one way and you really won't know how long it would have taken you the other way but maybe using AI is more enjoyable so it feels faster?
I do wonder what the distribution is though. It seems like using AI tools correctly really is a skill and I wonder if some devs more consistently save time than others using the right techniques.
Deranged40@reddit
I used Copilot to generate a C# class for me today. Something that just about every AI model out there can get 100% right. Only thing is, I'm not sure I can give it a prompt that is less effort than just writing the class.
I still have to spell out all of the property names I want. I have to tell it the type I want each to be. Intellisense will auto-complete the
{ get; set; }
part on every line for me already, so I don't actually type that part anyway.ByeByeBrianThompson@reddit
Or not even realize the time wasted checking the output is often greater than the time it would take to just wrote it, Checking code takes mental energy and the AI code is often worse because it makes errors that most humans don’t tend to make. Everyone tends to focus on the hallucinated APIs, but those errors are easy to catch. What’s less easy is the way it will subtly change the meaning of code especially during refactoring. I tried refactoring a builder pattern into a record recently and asked it to change the tests. The tests involve a creation of a couple of ids using the post increment operator and then updates to those ids. Well Claude, ostensibly the best at coding, did do a good job of not transposing arguments, something a human would do, but it changed one of the ++s to +1 and added another ++ where there was none in the original code. Result is same number of IDs created but the data associated with them was all messed up. Took me longer to find the errors than it would have to just write the tests myself. It makes so many subtle errors like that in my experience.
SnakePilsken@reddit
In the end: Reading code more difficult than writing, news at 11
beauzero@reddit
From the book that started it all. Thinking, Fast and Slow..."Causal explanations of chance events are inevitably wrong"...or thought about in this context human brains don't always interpret statistics correctly. Although I do agree with Adept_Carpet this may reflect level of effort or less tedium and therefore be perceived incorrectly as "faster" development time by those who use AI. I know I use LLMs to offload a lot of the boring template work and put more brain time on the fun stuff.
micseydel@reddit
In the social science, there's skepticism that G (general intelligence) is a real phenomena. I think they're right, that AGI will never exist, and that AGI will be declared once it's economically useful enough even though humans will need to maintain it indefinitely.
potat_infinity@reddit
so humans arent general intelligence?
micseydel@reddit
It sounds like you clicked the link I provided and disagree with it. Can you say why?
potat_infinity@reddit
I was just asking for clarification
Schmittfried@reddit
I agree on being skeptical about AGI ever being a thing, but I don’t see how the g factor is relevant to that opinion.
TheTacoInquisition@reddit
I noticed the same thing with some devs (important to note, not all) when covid hit and working from home was mandatory. They were hands down less productive, but self reported being far more productive. Mainly, I think they were just happier, better worklife balance and working in an environment they liked better.
With AI I'm seeing a similar trend. Lots of time prompting and tweaking and making rules and revising the rules... with self reporting of being slightly more productive. But when you have a look at the output vs time, its either almost the same as before or really quite a bit worse.
It could just be ramp up time to creating workflows and discovering processes that actually do make everyone faster in the long run, but the time being put into figuring it out is huge and there's as yet no way to know if there will be a payoff.
I've been liking using AI as well, I don't have to worry about the actual typing of every little thing, but unless I babysit it and course correct every little thing, it goes off piste very quickly and costs a lot of time to sort it out again. I've felt faster for sure, but looking back critically at the actual outcomes, I've spent more time on a feature than I thought I had, or just achieved less than I would normally have done.
muuchthrows@reddit
I’m interested in the productivity claim about working from home, do you have any studies or reading material about that?
TheTacoInquisition@reddit
Nothing I can share, the data would be from my company at the time. Of course, different people have different outcomes, we were just surprised when the self reporting for some didn't match up with reality. For some others the opposite happened. They had better productivity.
Not throwing shade at working from home, I have a 100% remote job now and will hopefully never go back to commuting. It's just interesting how self perception can be really off when it comes to actual output. For the AI discussion, I think its vital for us all to have some more measurable metrics than feelings, as those who LIKE AI are more likely to perceive a speedup vs those who do not. And even worse if C level execs mandate it and then use their feelings on the matter, when productivity may actually be harmed
muuchthrows@reddit
Thanks for the answer. Output is so extremely hard to measure, especially given that I find the largest time sink is organisations doing the wrong thing. If you’re working on the wrong thing then 0,1x productivity could actually be better than 1x, given that code is a liability and project failures destroy morale.
And I agree on your last part, it’s usually the execs who use their feelings and not data, be it about RTO or AI.
Brogrammer2017@reddit
How did you know your productivity metrics werent the ones that were wrong?
lookmeat@reddit
Oh this is inevitable. Even if all the promises of ML were true there still will be a bubble pop.
In the early 2000s the internet bubble popped. This didn't mean you couldn't make buisness selling stuff on the internet or doing delivery over internet, we know that can totallly work. It popped because people didn't know how and were trying to find out. Some got it right, others didn't. Some were able to adapt, recover and survive, and many others just weren't. In the early 2010s everyone joked "you don't have to copy Google you know", but they don't realize that for the previous 10 years, if you didn't copy Google you were bound to make the same mistakes the 90s tech companies that busted did. Of course by now we certainly have much better collective knowledge and can innovate more but still.
Right now with AI it's the same as the internet in the 90s, no one really knows what to do, what could work, what wouldn't, etc. At some point we'll understand what business there is (and while I am not convinced of most of what is promised, I do think there's potential) and how to make it work, a lot of companies will realized they made mistakes, and some will be able to recover, adapt and suceed, and many others just won't.
ThisApril@reddit
It feels like it's the https://en.wikipedia.org/wiki/Gartner_hype_cycle every time.
Though where that "Plateau of Productivity" winds up will be interesting. E.g., NFTs are further along in the hype cycle, but its non-scammy use cases are still vanishingly small.
awkreddit@reddit
Ed Zitron on bluesky and his podcast better offline had been reporting on their shaky financial situation for quite some time now
mark_99@reddit
"Hey chat, what's the statistical significance of a self-reported study with only 16 participants over a single trial?"
maximumdownvote@reddit
I believe they refer to that number as zero sir.
thingscouldbeworse@reddit
Notice how everyone who's heralding the age of "AGI" is a salesperson. The concept is laughable. We cannot measure and do not understand human intelligence, much less the basic biological processes of the brain, not fully. The idea that we're close to creating a machine that operates in the image of one is sci-fi hokum.
daddygirl_industries@reddit
Yep - there's no such thing as AGI. Nobody can tell me what it is. OpenAI has something about it creating a certain amount of revenue - a benchmark that has absolutely nothing to do with it's capabilities.
In a few years when their revenue stagnates, they'll drop a very watery "revised" definition of it alongside a benchmark that's tailored strongly to the strengths of the current AI systems to try wring out a "wow" moment. Nothing will change as a result.
Imaginary_Maybe_1687@reddit
Unrelated gem of "follow metrics, not only vibes" lol
ColoRadBro69@reddit
I've been taking longer to get my own open source projects together. But I'm also doing stuff like animations, that I've never done before. My background and core skill set is in SQL and business rule enforcement; LLMs are allowing me to step further outside my lane.
micseydel@reddit
Can you link to your project?
ColoRadBro69@reddit
Here's one, I'm using ML to identify the subject of a photo and remove the background. There's a lot of software that can do that now, I was making this for icons.
https://github.com/CascadePass/Glazier
elperuvian@reddit
Isn’t that an already solved problem? LLMs had plenty of come to steal from
joe-knows-nothing@reddit
Relevant xkcd:
https://xkcd.com/1319/
_jnpn@reddit
In certain contexts, where the same kind of task pops up regularly, I do an inverse strangler fig pattern, where I gradually swap parts of the task with a tiny helper, each effort should be a in-place quickwin, month by month the thing grows enough to cut 50% or more of the original time required. After a while I can rewrite / clean it and maybe make it a standalone tool.
summerteeth@reddit
Even when everything goes according to plan investing in automation often slows short term development for long term game (you hope, it’s a bet on ROI).
When using AI tools in my own workflows I have been very much in learning mode. I am investing cycles into them seeing if they have a long term roi. Not sure if people who participated in this study were doing something along the same lines.
It is possible they will be faster in the future past the scope of the study. It’s also possible they won’t but just playing devil’s advocate here.
norse95@reddit
Yesterday I used copilot for a few hours to make a tool that saves me a minute or so a handful of times per day so this post is hitting home
xsdf@reddit
Interesting theory, in that way AI isn't a productivity tool but a moral boosting one
micseydel@reddit
More like moral borrowing - like tech or cognitive debt, if moral is only boosted because of a misunderstanding, you should expect to pay that boost back later.
beargambogambo@reddit
I automate stuff so that I don’t have to remember, that way the processes are deterministic.
Weird-Assignment4030@reddit
The bigger win is that I can go about 80% as fast, but I can also be doing something else at the same time, whereas manual coding is a 100% mentally active exercise.
JimmytheNice@reddit
The title of OP is misleading - the research was specifically about OPEN SOURCE developers, i.e. mostly people well acquainted with their codebase they also often maintain as well.
So while I’m not an AI shill, this paper isn’t really that surprising - LLM use has diminishing returns the more you know your project. I could fix some SEV1 issues in my main project with my eyes closed as soon as I hear the description, but LLM usage isn’t the strongest in this context anyway.
cbusmatty@reddit
This is silly, “developer gets new tool that requires training and is slower”.
Show me the expert dev who uses the tools effectively who are slower and then we can start talking, but that doesn’t happen
codemuncher@reddit
Maybe, but the marketing is "sprinkle AI and fire all your devs and the 2 last ones will do the work of a 100 person team".
Sure "we know" that AI tools "arent like that", but really the marketing says it is so.
Besides which, computers should fit to our needs, not the other way around, so GET TO IT AI
cbusmatty@reddit
No marketing is that, except charlatan companies, that’s a pure straw man and this fake survey won’t stop people that believes the charlatans
codemuncher@reddit
Overplaying your strengths, and diminishing the weaknesses, using a rigged demo, etc, is totally standard PR/marketing, and always have been.
femio@reddit (OP)
Try reading the full study, that doesn't really cover most of the nuance.
For example, even factoring in a) being trained on LLM usage pre-study b) getting feedback on improving LLM usage mid-study and c) \~44% of the devs in the study being experienced with Cursor before, the trends show a consistent deviation regardless. It didn't even improve over the 30-50 hours of using the tool so it's not like it got better over time.
The study also makes it clear that this is a specific scenario where devs are working on codebases they know like the back of their hand (hundreds of commits over 3 years on average), and that it can't be applied to every task related to writing code or SWE work in general.
cbusmatty@reddit
I read the full study and it’s 16 developers this is ridiculous lol
Then-Boat8912@reddit
An experienced builder still needs to know when to use a hammer and when to use a nail gun.
Usual_Elegant@reddit
What about attention cost? If I get up from my computer for a couple minutes while the AI is churning through something does that get counted into the stat
DeterminedQuokka@reddit
No this is what I wrote in the assessment of vibe coding tests I did last week. I think I actually wrote “this took 24 hours longer than it would have taken me if I had just used autocomplete”
stevefuzz@reddit
I did a quick vibe coding check for creating some bash migration utils the other day. You know, just to say I tried. Started off ok, went way off the rails what a waste of time
DeterminedQuokka@reddit
You know I feel like that’s what happens. I was generating tests and it did a great job for the unit tests. The second I tried to do anything more complex than call one function that parsed a string it freaked out and literally mocked everything in the function. I couldn’t get it to stop so just only merged the first third.
UnluckyPhilosophy185@reddit
Sounds like a skill issue
stevefuzz@reddit
I'm also an architect and I like to keep my finger on the pulse of the ai shit. I work for a company that uses AI (classic nn and ml) stuff for large production systems, so the LLM buzz has been going on here. Execs obviously want us to use them as a coding tool. So, here I am. For auto complete and boilerplate it's great, actually doing real dev, awful. We've also been playing with other use cases of LLMs as products. It's really interesting and great for some things, coding is not one of them.
DeterminedQuokka@reddit
I've got to tell you, my execs keep bringing up the boilerplate thing, and I don't know what everyone else is doing. But I have negligible boilerplate. And the boilerplate I actually have I wrote mixins for years ago.
Maybe I'm just not in the right frameworks.
I like AI and I think it's useful. But I think most of the cases where it's actually helpful I complete the task slower.
ghostwilliz@reddit
Yeah. I'm over here wondering why do many people have so much boilerplate. You really shouldn't need that much imo
BetterWhereas3245@reddit
Legacy spaghetti messes with no stubs, templates or rhyme or reason to how the code is structured. Small features or changes require lots more code than they should if things were written well.
At least that's been the one instance where "boilerplate" comes to mind as something the LLM can help with.
DeterminedQuokka@reddit
My best theory is that it must be people learning to code like constantly making new apps. Because even if you were doing that at a real company you would have a template so they are all the same
timhottens@reddit
To risk going against the prevailing sentiment here, this line in the study stood out to me:
56% of the participants had never used Cursor before, 1/4th of the participants did better, 3/4 did worse. One of the top performers for AI was also someone with the most previous Cursor use.
My theory is the productivity payoff comes only after substantial investment in learning how to use them well. That was my experience as well, took me a few months to really build an intuition for what the agent does well, what it struggles with, and how to give it the right context and prompts to get it to be more useful.
If the patterns we've seen so far hold though, in all likelihood these good patterns will start to get baked into the tools themselves. People were manually asking the agents in their prompts to create a todo list to reference while it worked to avoid losing context, and now Claude Code and Cursor both do this out of the box, as an example.
maccodemonkey@reddit
I think this is missing the forest for the trees. The key takeaway I think is that developers thought they were going faster. That sort of disparity is a blinking warning light - regardless of tools or tool experience.
Franks2000inchTV@reddit
There is 100% a huge learning curve to using AI tools.
I use claude code every day in my work and it massively accelerates my work.
But it wasn't always like that -- at first I made the usual mistakes:
It definitely slowed me down and made the code worse.
But these days I'm able to execute pretty complex tasks and quickly because I have a better sense of when the model is humming along nicely, and when it's getting itself into a hole or drifting off course.
And then once it's done, I review the code like it's a PR from a junior and provide feedback and have it fix it up. Occasionally I manually edit things when I need to demonstrate a pattern or whatever.
If you're slowed down by AI, or you're writing bad code with AI, that's a skill issue. Yeah it's possible to be lazy with it and it's possible for it to produce shit code, but that's true of any tool.
maccodemonkey@reddit
Dude. Here's my problem with comments like these:
And then:
Yes. That's what I said.
Don't tell me it's a skill issue when I came to the same conclusion you did. I didn't even say I wasn't going to use AI tools anymore. All I said was I was going to use Claude Code - specifically - less.
What's really annoying about some crowds is if you say anything negative about AI tooling they jump out of the woodwork and yell "skill issues!" and then if you look at their actual workflow it ends up they don't think the agent is a magic box either.
KokeGabi@reddit
this isn't a new phenomenon. maybe exacerbated by AI but devs have always reached for shiny new things in the hopes that they will make their lives easier.
Beginning_Occasion@reddit
The quotes context however paints a bit different story:
Putting this together with the "Your Brain on ChatGPT" paper, it could very well be case that the one 50+ hour cursor dev essentially dumbed themselves down (i.e. obtained cognitive debt), causing them to be unable to function as well without AI assistance. Not saying this is the case, but its important that we have studies like these to understand these impacts our tools are having, without all the hype.
ZealousidealPace8444@reddit
Yep, totally been there. Early in my career I thought I had to chase every new shiny tech. But over time I realized that depth beats breadth for building real impact. In startups especially, solving customer problems matters way more than staying on top of every trend. The key is knowing why you’re learning something, not just learning for the sake of it.
TooMuchTaurine@reddit
50 hours of doing something is no where near enough time to unlearn years of normal development..... but it IS enough timr to learn how to use a new tool like cursor effectively...
Suspicious-Engineer7@reddit
They needed to follow up with this test with the same participants doing tasks without AI. Id love to have seen that one user's results.
pl487@reddit
50 hours is nothing. That's a week of long days.
My intuition agrees with yours. I didn't start feeling really confident at it for several weeks.
wutcnbrowndo4u@reddit
Yea I've been saying this consistently around here. The consensus here is that these tools are absolutely useless because they have weak spots is mind-boggling. They may not fit seamlessly into your existing dev workflow, but it's ludicrous to use that as a bar for their general utility.
Simple-Box1223@reddit
Agreed. I don’t know where the benefit lands overall given the myriad factors, but with a little bit of experience you can easily gain a net positive boost in productivity.
ALAS_POOR_YORICK_LOL@reddit
Yeah that sounds about right and matches my experience so far
fuckoholic@reddit
The reason 90+ % of devs are using AI is because it saves time. The study is made by idiots, for idiots.
TheseDamnZombies@reddit
Honestly fascinating results. I imagined AI would be a net positive to productivity even if most engineers over-estimated their productivity.
I've been using AI to accelerate the development of a mobile app and I'd be surprised if it didn't make me a little bit faster than I would have been. But maybe it didn't. It's astounding to imagine I might have made more progress in 3 months without AI, when the main reason I felt confident in pursuing the task was because of the availability of AI to help me not get stuck. (I'm not really a mobile or frontend developer, just backend.)
I would like to see the nitty gritty of these stats though. Stats on if certain individuals were more or less productive in their use of AI (i.e. you can be good at it, but most people suck at it), or if certain tasks were sped up while certain other tasks were slowed down, etc. The question here is if it really just adds more overhead all the time no matter what, or if it simply needs to be wielded more incisively with better prompts or more selective tasks.
But even then, the averages matter. Think about all of the energy being used, water that's harvested for cooling... when an unassisted human was faster the entire time. Talk about a bubble, geez.
labab99@reddit
Although the sample size is pretty small, the findings aren’t hard for me to believe anecdotally. There have been times where I thought I was being smart by using AI to throw together a proof of concept, and while the presentation is fantastic, it quickly devolves into a slog of repeatedly explaining the same requirements as it spits out prescriptive, overly-complicated code.
If I had just slowed down and used my brain instead, things most likely would have gone much more smoothly.
mcglothlin@reddit
Oof. 16 devs? This should be noted a lot higher up! Something to keep an eye on but I'm going to take it with a grain of salt for now.
muuchthrows@reddit
Sample size of developers were small (16), but the number of ~2h tasks were not that small, around 250.
MCPtz@reddit
And what's great is they will be following up on this study.
They developed the frame work and can hypothetically deliver quarterly reports.
Due_Satisfaction2167@reddit
Using AI to speed up software development is a bit like blindly inheriting someone else’s code, and that person just copy and pasted from stack overflow.
It’s nearly always faster to do it yourself than to try to understand someone else’s unique insanity.
Moloch_17@reddit
Interesting. Are there any actual studies comparing code quality? If the code is better it might be worth the slowdown. We all probably immediately assume it's worse but apparently we also assume we're faster.
DeterminedQuokka@reddit
Yeah there are the code quality is around 59% if you give AI at least 10 tries and take the one that works.
So worse than most experienced devs.
Moloch_17@reddit
I'm talking about a serious study where they compare quality metrics of code submitted by experienced and knowledgeable developers with and without using AI tools for assistance. Anyone can query an AI and copy code but we both know experienced devs are using it much differently
DeterminedQuokka@reddit
The study I was talking about was using leetcode metrics for performance and memory use. The actual readability on those is probably low. I don’t know the stats from the quality studies off the top of my head. But they do exist.
drcforbin@reddit
Got a link to one?
DeterminedQuokka@reddit
Yeah sure
this is the best of the leetcode ones I've read: https://arxiv.org/html/2406.11326v1
mostly because the N is really high which is great. It doesn't include any human intervention with the code though and is specifically using copilot which definitely has issues. The more interesting conclusions I found in it were around how many iterations you need to be confident you found a solution and the problems they were having with python specifically.
This one is on open source code with experienced devs so it's interesting it's real life kind of https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
It doesn't actually rate quality it rates time to completion. The main proven take away seems to be that people can't estimate how long AI actually takes to complete tasks. They perceive it to be faster but it isn't objectively. The study is more surprised by this than I am. They didn't find significancy directly on code quality of the final object but they did find significance on the fact that they rejected around 44% of what the AI did.
This one also includes a pretty good overview of why they think a lot of the historical research is problematic. It is basically small problems that don't require any context, not unlike the leetcode one above. So AIs do better.
This one is testing a lot of office work and leaves a bit to be desired on the details https://arxiv.org/abs/2412.14161
But basically they made a bunch of agents using most of the current models and fed them office tasks and they failed a lot. They do seem to be particularly good at pm and software stuff comparatively. But that's like a 2-37% success rate depending on the model.
All this to say there definitely are also papers about how AIs are the best at code, because there is debate in every field. And there are "correct" ways to use AI which people can debate about. But there is a world where we have to accept the harder it is to use an AI the less likely people are to do it.
drcforbin@reddit
Thanks! I appreciate you taking the time to post
Perfect-Equivalent63@reddit
I'd be super surprised if the code quality was better using ai
Live_Fall3452@reddit
How do you define quality?
ares623@reddit
The same way we define productivity.
Spider_pig448@reddit
I wouldn't. It's a second brain giving you new ideas and teaching you need tools. I'm sure as long as the engineer is using it correctly, I'm sure it results in better code overall
ghostwilliz@reddit
It's an autocomple that trains you to stop thinking imo
Spider_pig448@reddit
Only if you use it wrong. It's an intern making suggestions that you take into consideration when designing your solution.
RadicalDwntwnUrbnite@reddit
So far research is showing that neither is the case.
An MIT study had a group of students write essays, one that could use ChatGPT, one that could use web searches (without AI features) and one that could use brain only. By the third round that ones that could use AI resorted to almost completely letting the AI write the essay. Then on the fourth round they had the students rewrite one of their essays and the group that used ChatGPT could barely recall any details from their essays. It included EEGs that showed deep memory engagement was the worst amongst those that used ChatGPT.
Another study did some math tests where the students using AI in the practice exam did 48% better than those that could not use AI, but in the actual exam where they could not use AI did 17% worse. A third group had access to a modified AI that acted more like tutor and did 127% better on the practice exams than those that had no access, but ultimately did no better on the exam with AI (so there is a potential there as a study aid but it's no more effective that existing methods.)
failsafe-author@reddit
I think my designs are better if I run them by AI before coding them. Talking to an actual human is better, but takes up their time. Ai can often suffice as a sanity check or by detecting any obvious flaws in my reasoning.
I don’t use AI to write code for the most part, unless quality isn’t a concern. I may have it to small chores for me.
Thegoodlife93@reddit
Same. I really like using AI to bounce ideas off of and discuss design with. Sometimes I use its suggestions, sometimes I don't and sometimes just the process of talking through it helps me come up with better solutions of my own. It probably does slow me down overall, but it also leads to better code.
Moloch_17@reddit
Me too but I've been super surprised before
bogz_dev@reddit
i haven't, i've never been surprised-- people say about me, they say: "he gets surprised a lot" i don't, i've never been surprised
i'm probably the least surprised person ever
CowboyBoats@reddit
Boo!
bogz_dev@reddit
saw that coming from a mile away, you can't teach a horse to suck eggs
revrenlove@reddit
That's surprising
bogz_dev@reddit
skill issue
SuqahMahdiq@reddit
Mr President?
TheMostDeviousGriddy@reddit
I'd be even more surprised if there were objective measures or code quality.
StatusObligation4624@reddit
Aposd is a good read on the topic if you’re curious
itNeph@reddit
I would too, but for fwiw the point of research is to validate our intuitive understanding of a thing because our intuition is often wrong.
vervaincc@reddit
The point of research is to validate or invalidate theories.
itNeph@reddit
Hypotheses, but yeah.
Kid_Piano@reddit
I would be too, but I’m also surprised that experienced devs are slower with AI.
Jeff Bezos once said “when the anecdotes and metrics disagree, the anecdotes are usually right”. So if the devs think they’re faster, maybe it’s because they are, and the study is flawed because the issues completed were bigger issues, or code quality went up, or some other improvement went up somewhere.
Perfect-Equivalent63@reddit
That's got to be the single worst quote I've ever heard. It's basically "ignore the facts if your feelings disagree with them" I'm not surprised they're slower cause I've tried using ai to debug code before and more often than not it just runs me in circles until I give up and go find the answer on stack overflow
Kid_Piano@reddit
In that situation, you believe AI is slowing you down. That’s not what’s happening in the original post: those devs believe AI is speeding them up.
2apple-pie2@reddit
the core is not taking unintuitive statistics at face value
lying with numbers is easy. if all the anecdotes disagree with the numbers, it suggests that our metric is probably poor.
just explaining that the quote has some truth to it and isnt “ignore the facts”, more like “understand the facts”. i kinda agree w/ your actual statement about using AI
Efficient_Sector_870@reddit
When the anecdotes and metrics disagree, abuse your human workers and replace as many as possible with unfeeling robots
DisneyLegalTeam@reddit
I sometimes ask Cursor how to code something I already know. Or ask for 2 different ways to write an existing code block.
You’d be surprised.
Abject-Kitchen3198@reddit
Sometimes, when I see how my code evolved, I wonder.
Beneficial_Wolf3771@reddit
This is r/ExperiencedDevs , we can admit here that code quality is more of an idyllic thing to strive for than the reality we face day to day.
electroepiphany@reddit
skill issue
One-Employment3759@reddit
is what someone that never ships says.
electroepiphany@reddit
lol whatever you wanna tell yourself buddy. Some of us just write code that’s at least pretty good the first time
Beneficial_Wolf3771@reddit
Yeah. I myself write code that’s “pretty good” as do most of us. But that’s just usually all we have the time for. It’s the reality of programming as a job vs programming as a pursuit.
dontquestionmyaction@reddit
Yeah, even my worst colleague says that.
electroepiphany@reddit
Cool story bro
tikhonjelvis@reddit
code will never be perfect but code at real companies can absolutely be (much!) better or worse
honestly, it's pretty depressing how often I run into people who don't believe code quality exists—it's a tacit indictment of the whole industry
New_Enthusiasm9053@reddit
It's depressing how often people don't unit test. Code quality is also invariably poor because the dev doesn't get punished for using excessive state by having to write a boatload of tests.
SketchySeaBeast@reddit
Certainly, it's never gonna be perfect, but I think we all know the difference in code between "wtf?" and "WTF!?!!" when we see it.
ninseicowboy@reddit
A study evaluating “quality” of code seems tough. How would you quantitatively define “quality”?
SituationSoap@reddit
Google's way of measuring this was shipped defect rate, and that goes up linearly with AI usage.
ninseicowboy@reddit
Finally some good news regarding the SWE job market
kaumaron@reddit
https://devclass.com/2025/02/20/ai-is-eroding-code-quality-states-new-in-depth-report/
Moloch_17@reddit
"an estimated reduction in delivery stability by 7.2 percent"
Code reviews are probably the only thing keeping that number that low
RadicalDwntwnUrbnite@reddit
The product my employer sells is AI based, it's ML/DL, not LLM/GenAi though, but we've "embraced" AI in all forms and using Copilot/Cursor is encouraged. As an SWE that is also basically the lead of the project I'm on, I've shifted significant amount of time from doing my own coding and research to reviewing PRs. I find myself having to go through them with a fine tooth comb because the bugs AI is writing are insidious, there is a lot of reasonable looking code that gets rubber stamped by my peers that I've basically resorted pre-blocking PRs while I review them.
Moloch_17@reddit
That's something I've noticed too. On the surface the AI code looks pretty clean but there's little logic errors often times that will trap you.
RadicalDwntwnUrbnite@reddit
I've seen so many "this works as long as we never need more than 10 items, that's like 2 more than most people use right now" jr. dev style mistakes.
Suspicious-Engineer7@reddit
Shit 7.2% is huge already
Moloch_17@reddit
I expected it to be higher honestly
TheCommieDuck@reddit
this is grasping at the vague mention of straws in a 10 mile radius.
According_Fail_990@reddit
Concepts of a straw
SituationSoap@reddit
Google's studies have shown that a 25% increase in AI usage correlates to a 7% increase in defect rate, pretty linearly.
drnullpointer@reddit
There are studies. As far as my understanding goes, studies show initial productivity boost followed by slow productivity decline exactly due to code quality.
The biggest problem with code quality that I understand is happening is that people relying on AI are biased against fixing existing things. AI is so much better (so much less bad?) at writing new code than refactoring existing codebase. Therefore, you should expect teams with significant AI contributors to accumulate more technical debt over time in the form of larger amount of less readable code.
garlicNinja@reddit
Yeah let's see the long term tech debt study
bishopExportMine@reddit
Sample size of 16 devs...
Cahnis@reddit
I feel like waiting on agents to do their thing kills my flow. I take almost as much damage as context switching. Every time the AI start doing their thing I alt-tab, I get up to get a drink.
Unfair-Sleep-3022@reddit
I haven't seen anyone actually experienced thinking that
femio@reddit (OP)
Profile on the devs in the survey:
OccasionalGoodTakes@reddit
That seems like way too small of a sample size to get anything meaningful.
Sure it’s a bunch of code, but it’s from so few people.
BatForge_Alex@reddit
They mention this in the article:
electroepiphany@reddit
might not even be a bunch of code tbh, that just means the chosen devs contributed to a big repo, it says nothing about their individual contributions.
FamilyForce5ever@reddit
Quoting the paper:
micseydel@reddit
Big corps could work together to put out a better data set. I'm sure they would, if the results were good.
SituationSoap@reddit
One of the biggest smoking guns about the actual unit economics of AI adoption is the fact that there isn't a single non-startup case study for AI adoption making companies a bunch of money.
Careful_Ad_9077@reddit
16 ?
the corpse of teh fatehr of statistic is rolling in his casket.
AssociateBig72@reddit
This study highlights a really important, albeit counterintuitive, point about AI in development right now. It's easy to get caught up in the hype, and the reality that AI can slow down experienced devs is a crucial observation about current tools and workflows. The core issue might be that many current AI tools are designed to augment existing coding paradigms, which can add overhead like prompt engineering or code review. Perhaps the true leap in productivity with AI isn't in making experienced coders faster, but in empowering entirely new groups to build without traditional coding constraints. It shifts the paradigm from 'AI helps me code better' to 'AI codes for me'. This perspective is why we focused fn7 on enabling non-technical founders to build apps and websites just by chatting with AI, rather than trying to speed up existing developer workflows. Our goal is to make creation accessible, not just marginally faster for those already proficient.
hubbabubbathrowaway@reddit
n == 1 here, so grain of salt etc.
From my personal experience so far, I feel a lot faster when using LLMs to write code, but I've found that my time spent debugging afterwards has exploded. Even if I stick to using it as a "better autocomplete", the code generated looks good, but there's an off-by-one here, an edge case there... and if you use the LLM to write the tests too, you're done for.
And even if the tests are correct, and the code works, often there's a better implementation, think readability, performance, or even more importantly: Matching what the Juniors are able to understand at the moment.
Code is written once, modified a few times, but read hundreds of times. Programming with an LLM makes the part with the least "runtime" feel faster, but the rest suffers.
I hope this will improve in the future, but right now I don't believe it. LLMs are trained on publicly available code, and most publicly available code on the net is shit, plain and simple. For every beautiful SQLite source, there are hundreds of hastily cobbled together crap repos poisoning the LLM, and I doubt that's gonna improve...
Typicalusrname@reddit
AI is good at certain things. If you use them exclusively for those, yes it does make you faster, I’d wager around 15-20%
Bobby-McBobster@reddit
Commenting this on a post about a scientific study that SHOWED it makes you slower and SHOWED that it makes you THINK that you're faster is really classic /r/ExperiencedDevs.
Franks2000inchTV@reddit
Taking a single study that confirms your priors as Gospel truth is peak Luddite behaviour.
Bobby-McBobster@reddit
This is by far not the only study that has demonstrated that GenAI has a negative impact on productivity.
GoonOfAllGoons@reddit
Well, gee, one single study and I guess it's settled, right?
I'm tired of the AI hype, too.
To say that it automatically makes you slower and dumber no matter what the situation is a bad take.
sciencewarrior@reddit
From what I understand from the article, those tasks were all of the same type: Changes in large codebases (LLM quality falls with more than ~100,000 lines of code.) The developers were intimately familiar with those codebases, meaning they were already productive unassisted. Tell them to start a greenfield project, and you're likely to see different results.
goldenfinch53@reddit
On a study where half the participants hadn’t used cursor before and the one who had the most experience also had the biggest productivity boost.
nomadluna@reddit
This is too good, gotta be trolling. For your own sake I hope you are
IDatedSuccubi@reddit
It's really bad at C, can't even pass static analysis and/or sanitizers after a simple request, absolutely no use.
But I found that it's really good at Lisp, really helped me recently. Definetly 2x'd my productivty just off the fact that I don't have to google usage examples for uncommon macros or odd loop definitions.
hardolaf@reddit
I've found it to be very good at complex refactoring tasks when only considering the next 1-2 changes needed after I manually start the change. It easily speeds up the typing portion of my job by 50% which is to say around a 2-3% total productivity increase but like a 30% reduction in unnecessary RSI inducing work.
angriest_man_alive@reddit
I use it exclusively to format data for me. I know for a fact it can do it faster than I can. So I don't think it slows me down, but that's because I'm not using it for anything more than a glorified universal formatter.
phoenixmatrix@reddit
I definitely take longer with AI on individual tasks. Because I'm doing 6 of them at a time while doing a bunch of non coding stuff.
I can launch a bug fix via Devin while in a meeting (that I'm actually paying attention to).
In absolute it takes much longer. But they're tasks I would not have done at all.
There's also a big gap in term of AI usage. It's new, some (most?) people are really bad at it.
Krom2040@reddit
I’m honestly mystified whenever I hear about people integrating LLM code generation directly into their dev process. Like I absolutely love LLM’s as a way to generate a basic outline of methods using API’s I’m not very familiar with it, much like a drastically improved version of Stack Overflow, but then I still end up writing the code according to my own preferences and making sure that I reference the API docs whenever I see methods or patterns that I’m not already confident about.
LLM’s are a wonderful tool but it’s just a foreign concept to me that you would include any code in your project where you don’t essentially understand the underlying intent and behavior.
Franks2000inchTV@reddit
So maybe just... read the code?
Like AI-written code isn't in hieroglyphs. It looks exactly like any other code.
Higgsy420@reddit
I have had this same thought recently. My company bought us Claude subscriptions but honestly I'm probably not going to use it.
Franks2000inchTV@reddit
Claude Code with Opus is a huge step up. I would HIGHLY recommend trying it and ignoring all the naysayers.
SuspiciousBrother971@reddit
It's comprised of 16 open-source developers from major projects. These individuals are significantly above par compared to the average developer. They also didn't use Claude 4 or Max Opus, currently the best models.
These results don't surprise me; the better programmer you are, the worse results you will get with these models.
Franks2000inchTV@reddit
Yeah Opus is the first model I trust.
If I ever hit the Opus usage cap, I stop using Claude for work that matters.
Like I'll Sonnet it to ask questions about the codebase, or write small simple functions, but I don't let it write any significant code that will be committed.
vvwccgz4lh@reddit
Thank goodness that github said this on Copilot:
Code 55% faster with GitHub Copilot
RustyGlycan@reddit
I have some concerns about the study, it doesn't report if the findings are significant, which is a huge red flag for me.
Additionally the fact that the times it takes to complete a task are self reported, and that only one of the Devs have more than a week's worth of experience using cursor, makes me a bit suspicious.
That said, the really fascinating thing is Figure 6. which shows that the time spent on all the parts of the job (e.g. researching, coding, etc) all decreases, but that decrease is offset by prompting, waiting for the AI, and reviewing AI code.
It's almost like if writing with a computer was quicker than a typewriter, but the time spent getting your printer to work made it slower overall.
RecursiveGirth@reddit
I love nothing more than throwing a problem into deep research and moving on to the next issue while waiting for a response. Some of you are behind the curve...
AI is a tool, use it... or you will lose it. ("it" being your fucking job)
Lonely-Leg7969@reddit
What a terrible take
RecursiveGirth@reddit
Why, Because you disagree? What objective points do you have to refute my "terrible" take?
VastlyVainVanity@reddit
Because AI bad, bro. Look where you are, Reddit is Luddite Central when it comes to AI.
Lonely-Leg7969@reddit
You don’t have to be a Luddite to not fall for hype.
Lonely-Leg7969@reddit
I just think the assertion that LLMs are a requirement to keep your job is incorrect. Having used LLMs for a bit myself, I’ve found it to be a distraction to actually getting things done. If you use it and you meet your intended work objectives, more power to you. If you don’t and you still meet your objectives, then no I don’t think you’re in line to lose it.
Arkanin@reddit
I tend to use LLMs for code reviews as much or more than the actual code. I have to handpick their feedback since it is often wrong. For me this results in slightly slower development but way fewer bugs.
MesmerizzeMe@reddit
I guess ultimately even with the best AI imaginable, there is still so much ambiguity left in language that specifying every detail is a ton of work. If only there was a way of talking unambiguously to a computer it would revolutionize vibe coding.
menckenjr@reddit
I think they call that "code"...
Venisol@reddit
God I fucking hate that sentence. They create a study backed by methodolgy and evidence and just instantly throw in a totally baseless "yea for sure things are gonna massively improve". WHY?
WHY IN THE FUCK DO YOU THINK THAT? LLMs have been the same for coding for 2 years. Theyre stagnant. Why would you say that? People are so fucking conditioned to excuse the state of llms its ridiculous.
abeuscher@reddit
I really think LLM's appeal to gamblers and people with that gene. I notice it in myself if I am not paying attention; they trigger this dopamine loop where each answer is almost the one you need, and you get sucked down a hole of promises.
I have 25 YOE and I do notice that while I feel good about using LLM's to help me plan and learn, I immediately become frustrated when I try to get them to generate any kind of complex code above like a RegX.
But I do think there is an active dopamine loop in LLM's which causes this false confidence.
Fireslide@reddit
Yeah there's definitely that element of it, if I just build the prompt right, this time it'll generate what I want and move on to next feature.
When you're on a win streak of getting the answers you want out of a prompt first try multiple tries in a row, it feels great. Velocity is huge, but when it fucks up context of folder paths for building a dockerfile or something, or continually hallucinates modules or features from old API that don't exist you realise you've just wasted 30 minutes that could have just spent reading the docs and solving yourself.
The last year or so for me has been working out how to incorporate them into my workflow to be productive. It's about getting a feel for what I can trust them with to do first try, what I'd need to get them to build a plan for first, and what I'll just not trust them to do because their training data lacks density, or or it's density is for an older version of what I'm using.
MoreRopePlease@reddit
Funny, I have this same thought pattern when dealing with some contractors and coworkers. "Dude, ok I wasn't super explicit about this one thing, but if you think about it for one second, shouldn't you test use case X? And if you do, then it's obvious your solution is incorrect."
Rodwell_Returns@reddit
It depends what you are making. If you are making something standard, "AI" really helps. Prime example I suppose are basic websites.
Personally I am making highly specific software that "AI" knows absolutely nothing about, so it is purely a detriment in my case.
mistaekNot@reddit
i suspect this study is flawed somehow. AI is not almighty yet but it’s pretty fucking good. prompt it on a class / function level with appropriate context and you’re gonna have a good time.
KallistiTMP@reddit
I think this makes sense, honestly.
Like, the parts of coding that AI can actually reliably accomplish are usually warning signs of bad design.
"This saved me 3 hours typing out boilerplate methods" - my brother in christ, have you ever heard of class inheritance?
I don't know about everyone else, but when I start typing too much code manually, that usually sets off a little warning light in my head that I probably need to pause for a moment and rethink my approach.
I can definitely see myself rapidly blowing past that warning light and ultimately creating more work for myself if I had a low effort universal boilerplate generator always within my immediate reach.
nachohk@reddit
Eh. I don't use LLMs to write code, I use them more like I use a search engine. Provided you understand its strengths and limitations and how to make effective use of it, LLMs can absolutely speed things up. Getting a brief, direct answer from an LLM for "Remind me what the standard library API was for this in X language?" is indeed faster than doing a search, combing through SO, or scrolling through a page of docs. (Although I may be unusual in how I actively work with enough different languages that this is something I often need to be reminded of.) They also have a success rate at least as good as searching and checking the first several results for quickly elaborating on shitty cryptic error messages.
But yeah, no, I'm not surprised if it's slower when instead you're just doing that obnoxious back and forth of trying to get an LLM to stop forgetting requirements and output code that makes sense.
matthra@reddit
Did you guys know that a sample size of 16 developers is large enough to make sweeping claims about how all developers function with AI in an unfamiliar repo?
But AI bad, so up vote away.
chrisza4@reddit
You are the one who make sweeping claim and project it to everyone else.
femio@reddit (OP)
Actually, from the study:
The study is more oriented towards 1) quantifying how reliable self-reporting productivity gains are, and 2) identifying any constant inefficiencies in LLM tooling (which appear to be the inherent requirements of writing prompts, more time spent on testing, and more time spent reviewing code)
VastlyVainVanity@reddit
Study confirms your biases (if it didn’t it’d get downvoted to hell on this sub).
I honestly don’t care about studies that get upvoted on Reddit lol
Imnotneeded@reddit
AI is still in "bro" mode. Like NFTs, Crypto, it's pushed like the ultimate solution
nacholicious@reddit
And pushed by salespeople who are less qualified than your average engineering intern, rather than listening to actual engineers
ghostwilliz@reddit
Anecdote ahead
At my old job, they added copilot. This worked on vscode for work and visual studio for my personal al projects.
The place was going downhill and everyone was just using copilot. Our team sucked ass at that point.
I got lazy and started using it in my personal projects.
Anyways, I got laid off and copilot stopped working. I was a moron for about 2 days, but once i got used to it, I wad so much better than when using copilot.
It trains you to stop thinking. The code I produced with it was ass and I made a lot of code but never really got anything done.
I breezed by everything I was stuck on in my personal project now that copilot was gone.
I don't think I'll use ai tools again
wachulein@reddit
It took me some time, but I think I finally arrived to an AI-aided dev workflow that feels like having a small team that execute tasks for me. Wasn't feeling much productive before, but now I can't wait to keep the flow going.
Historical_Emu_3032@reddit
faster, faster, faster.
I'm not going anywhere near companies like this.
handmetheamulet@reddit
Neat
mwax321@reddit
Ok but who's reading the study? It's based on the dev's estimate of how fast they would complete it with/without ai.
Shadowys@reddit
LLMs suffer from a rapid and silent degradation after the first message which is well documented. Using the same agent for a prolonged context (such as an agent doing at task with a validator) is bound to reach into this issue and the errors will exponentially combine resulting in longer dev time and token usage.
Personally I found that HOTL performs way worse than HITL. You need the human to perform first principles analysis on the problem as much as possible so they can verify the solution from the AI
no_spoon@reddit
As a senior dev myself, I feel like it’s way too fucking early to make this call. All of us are still learning how to incorporate these tools into our workflows. Stop drawing conclusions, it’s annoying.
FortuneIIIPick@reddit
For simple things, 20% faster might be about right. For anything of serious complexity, I'd say -20% is being generous.
Nodebunny@reddit
theyre always giving me shit for answers
Strus@reddit
For me personally I don’t care if AI is slower than me - I use it for things I don’t want to code myself. Boilerplate, linter issues in legacy code, one-shot scripts, test data, data manipulation etc. I probably could do all of this faster myself, but I just don’t want to do it at all.
Far-Income-282@reddit
It also let's me context switch between all those shitty things.
Like I feel like I might be 20% slower on any one project but now I'm doing 4 projects at 20% slower, so maybe 4 projects in 4 months, where as maybe before I'd do one project in 3 months and then spend 1 month complaining about not wanting to write tests anyways.
Which now that I say that, AI has actually made me like doing test driven development. It makes it way easier to do first and check the AI.
Now that I write it that way... I wonder how many people that used AI in that studied realized makes all those best practices (like TDD) that we all knew we should have done but didn't easier, and also set up a repo for faster AI success later. Or are they still coding like they are in control.
SketchySeaBeast@reddit
"Create unit tests, and pretend you're wizard while you do it."
"OK, now take all the wizard references out."
jakesboy2@reddit
mine calls me “my lord” and the role play is worth the cost alone
TheMostDeviousGriddy@reddit
You must type really fast if you're quicker at the boilerplate stuff. For me personally the only way AI would be slower than I am is if I'm doing something out of the ordinary, which if that's the case, I know better than to ask it, and if I do get desperate enough to ask it, it'll tend to bring up some information that can help guide a google search. I have seen where it has just made up methods that don't exist before though, so that can waste a lot of your time if you lean on it.
Open-Show5557@reddit
Exactly. The cost of work is not wall time but mental exertion. Offshoring mental load, even if it takes longer, is worth it to spend the limited mental resources on highest leverage work.
DeadButAlivePickle@reddit
Same. I'll sit there for 10 seconds sometimes, waiting for Copilot to come alive, rather than fill some object fields in manually or something. Lazy? Sure. Do I care? No.
awkward@reddit
Most of my prompts get written after 4pm as well.
xusheng2@reddit
I think the key detail in this study is that all of the developers here are "experts" in the codebase. I've always felt that the most speedup AI has is in helping reverse-engineer or exploring a part of the codebase that I'm learning about.
Forsaken-Promise-269@reddit
Guys it’s a skill that needs to be adopted just like any other
ie see this: https://claude.ai/public/artifacts/221821f0-0677-409b-8294-3...
remimorin@reddit
It does make sens, reading and debugging code is as mentally exhausting as writing code.
A lot of "production code" I found it easier to do it myself.
I try to get better with LLMs but I frequently find that avoiding the "overachieving" and avoiding unrelated changes require more works than just do the job.
But again if I were to learn a new language, I would say "I am so more efficient is this other language where I am familiar with the whole ecosystem".
So I believe as time pass we will develop good practice and improve tooling around LLMs.
Also LLMs have lower by a lot the learning curve of a new tech. With them I am more efficient while learning.
Finally boiler plate, one time scripts and such (other have made a better list).
Schmittfried@reddit
I definitely take less time using it because I don’t use it for problems that take longer to find the right prompt than just solving them myself.
failsafe-author@reddit
This isn’t what I use AI for.
lookmeat@reddit
This makes intuitive sense. It's the classic Waymo vs GoogleMaps dichotomy: Google Maps offers routes that are actually faster, but Waymo feels faster. That is because Google Maps will pull you through traffic, and you will have to stop at key points, but it's still the fastest route overall. Waymo tries to avoid this frustrating experiences that make you feel slow but actually are the setup needed to go as fast as possible.
BTW I really appreciated that the article has a table specifying what they are not claiming, and the real scope of the context. It's so important (especially in ML research) that I want to quote it here:
That's just so nice that I now wish many papers, and every scientific article, had a table like this at some point shortly after the introduction/abstract.
Also lets be clear (in the same spirit as the table above) that this post is just speculations and intuitions on my part, none of this should be taken as true from this.
It makes sense though. AI speeds you through a lot of things, and if you have a good enough idea of what you want, it will give you a good enough solution. I feel that seniors sometimes lack the vision that when they give stuff out to mid and especially junior engineers, it already has patterns and references that can help create a mental model that the engineers can follow when they make their own thing. It may look differently but it still fits within the same model. LLMs are the opposite, they only make things that look the same, even when they don't fit within the model at all. To compound issues engineers are throwing LLMs to write code that is still too early to make it work. You have to go back and fix this things. The conventions and tricks earned to guide it just won't work with LLMs.
And honestly anyone whose gone to a serious programming language discussion, you learn that what really matters is not the syntax, but the semantics, the meaning of things. LLMs understand language at a syntactic level perfectly, but not semantic. They don't understand what the word on its own means, but rather the relationship it has with the words around it and what goes next.
Now I think that agentic AIs need a lot of work to get good and useful. They are too mediocre and dumb, and you're better off doing it yourself many times. Ultimately it's the same balance of automation we've had before just tweaking the prompt rather than the script.
And I do think that agentic AIs have their value. I think that as code analyzers (what static analyzers do nowadays) which is the obvious. Less obvious I believe is automated code improvers. So whenever I do a change in my library (be it an open source library, or one used by others) which has deprecated code, or now prefers something is done in a new way vs the old, I include a small documentation on how change the old way of doing code to the new one, as part of release documentation/notes/commit description. Then an agent on a downstream library can pick up on this, and create its own PR updating the downstream library's use of your stuff for you. Sure the library author would have to care to make sure that the code changes are easy for an LLM to do, but this isn't new. I tend to write code changes in a way that is awk-friendly so that it's easy to do automated changes on downstream libraries as a janitor.
But that kind of hints at the thing. None of those things "speed up developers" as the idea goes. Rather they simply free up time from developers who are valuable but struggle to explain that value (yet companies who lack these developers struggle really bad).
RedbloodJarvey@reddit
I feel like I end up spending more and more time crafting a message for the LLM.
Sometimes an LLM's ability to troubleshoot an issue are almost magical.
But there is a point of diminishing returns when a problem is so complex or subtle, it would have been faster to just figure it out the old fashioned way: pulling out the debugger and walking through the code line by line.
But it's hard to tell upfront if you just need to give the LLM a little more context, or if it's just never going to understand the current problem.
This LLM babysitting gets very mentally draining.
ViveMind@reddit
False
thorax@reddit
The number of useful projects I've coded that I never would have even started is insane. I'm so much more productive coding things than ever before.
Might be true deep legacy codebases or inexperienced "vibe" coders will be slower, but once you understand its strengths and weaknesses, it becomes an incredible productivity tool.
Cyral@reddit
Insane you are being downvoted. It’s so helpful for prototyping. I can try a new API like WebTransport or WebGPU and get off the ground running in five minutes. So helpful for being able to test if something is viable without dedicating half the day to it.
digitizemd@reddit
The majority of work in my career has been working on large, legacy code bases.
codemuncher@reddit
I do think that LLM coding has had a big impact on starting new projects, which is great!
I also think of my experience at big companies, the kind most people would die to work for, in that kind of "startup new project" work is basically 0. I spent nearly all my time changing existing large systems, and LLMs do not do well on that.
And existing large systems mean 500kloc+. None of this "10k loc is big" nonsense.
LLMs just cannot context a huge codebase, maybe one day?
thorax@reddit
Right, you go in with understanding those limitations. You still have to know your codebase, just you get to make the surgical changes you understand. Much less able to tell it to do "code a whole feature" and much more likely to get it to do cleaner documentation, tests, auto complete on steroids. You just can't go in expecting it works too understand 100k loc, and where it keeps getting confused, setup docs that help give it examples to avoid confusion
It's no silver bullet, but there's a tremendous amount of tasks you can rely on it for. You just have to be proficient with the tools enough to avoid having it work on areas that it doesn't understand.
You still have to be the developer who knows your codebase, but you have a newbie intern who can cover the easy bases at least.
zulrang@reddit
The only use an experienced dev should get out of an LLM for coding is typing for them code that they already have arranged in their head, and writing documentation.
Ffdmatt@reddit
Probably because we still have to read the code and make sure it makes sense, etc.
I'm sure the LLMs will improve and may even be damn near perfect every time, but I still can't imagine a serious developer just accepting everything and never reading or planning.
I'm not sure you could ever fully optimize for this latency. When before it was just a single mind running through an idea, now you have to stop to read the LLM's thought process and balance that with your original vision.
forbiddenknowledg3@reddit
Well I keep seeing people use AI for tasks you could already automate. Most people just never bothered to learn find and replace with regex for example.
Individual-Praline20@reddit
I would have thought it would’ve been at least 50% slower frankly. That’s what I anecdotally found with AI freak colleagues. 🤣 I’m laughing at them on a daily basis for using that crap
Blasket_Basket@reddit
Interesting results, but I don't know how much I trust this study. n=16 is a pretty small sample size, and I'm not sure how representative seasoned experts in a codebase they're deeply familiar with is of SWEs in general.
Existing research has already shown that for true experts, AI actually hurts more than it helps, but this is not true for everyone else. I would posit that these results align with those previous findings, but would need a much bigger sample size and further segmentation to be able to make a statement as general as "AI makes devs 20% slower". What about jr or mid-career devs working on blue sky projects, or onboarding into a section of the code base they aren't familiar with, or using AI for incremental productivity gains like Unit Test coverage or generating documentation?
These findings may well be true, but I think the headline here oversells the actual validity of the findings of this single study.
Yeti_bigfoot@reddit
My initial thoughts weren't positive when I played with an ai assist tool.
Admittedly, only for half an hour. But in that half hour I found it was quicker to do the little stuff I was playing about with myself.
Maybe it'll be better for bigger changes, but then I'll want to check out all the code which will take time. The time I could've spent writing it.
When I want to change something I'll be reading someone else's code and have to learn where everything is rather than knowing the code architecture because I wrote it.
I'll try it again at some point, I'm probably just not using it very well.
itNeph@reddit
Hypothesis, but yeah
kittykellyfair@reddit
Do they take more time but also get more done, write code that's got better edge case case protection and code coverage?
When I use AI it's like having my own personal, very competent, Junior engineer. I give them good requirements and they write the lines but I also take the time to review their work and direct them to make adjustments, explain what I want tested, and verify it all. Compared to the entire development process including MRs I feel like I get a lot more done and better. But maybe I'm just another data point supporting the OP research .
geon@reddit
Uncle bob claims the inverse about unit testing: feels slower, actually faster.
According_Fail_990@reddit
We have over 70 years of quality management studies showing that eliminating sources of error is far more effective than trying to fix error mid-process.
If you want an argument as to why LLM dumb dumb, that’s it - it isn’t worth speeding up the coding process if it slows down the debugging process.
sinnops@reddit
You just spent more time writing a prompt then continually adjusting the output to optimize it when you could have just written in less time.
GoonOfAllGoons@reddit
Open source projects and they used 2 hour tasks as a baseline?
AI is used to streamline tedious stuff that takes much longer. Not a fan of their methodology.
przemo_li@reddit
Change my mind: LLMs as non-deterministic tools they are uniquely hard to reason about. This means that our discipline famous lack of objective measures is plunged even deeper into chaos, now we can't even be sure of our own anecdotes, however little they mean with deterministic tools.
TacoTacoBheno@reddit
Maybe I'm just a prompting bozo, but asking Claude to generate a sample json based on my pojos never quite worked. Hey you forgot to include the child objects, you're right here you go, same junk, and it invented fields and incorrectly typed things.
ZombieZookeeper@reddit
It's either AI or trying to get an answer on Stack Overflow from some arrogant ass with a profile picture of themselves kayaking. Bad choices all around
drnullpointer@reddit
It does not matter.
There are long term effects of using AI that I think far outweigh the initial 20% this or that way.
I think people relying on AI will simply forget how to do coding. I think I can make that assumption because the same happens with most other skills.
But, coding, also contributes to other skills like system thinking, technical design, problem solving.
I think that over time, people who rely on AI will start losing a bunch of related skills, at least to a certain degree. And new devs who grow on AI, will never really learn those skills in the first place.
Adept_Carpet@reddit
What's interesting is that open source development represents a best case scenario for LLMs, this is what they were trained on (including documentation, issue histories, etc).
The work I do requires a lot of contextual knowledge and proprietary software so it's not a surprise that LLMs can only nibble around the edges. But I would have guessed that they would be good at open source libraries.
lyth@reddit
This isn't necessarily a fair measure. "Finished the ticket" isn't always the same as "and wrote really good test coverage, with really good tech debt to feature completeness ratio."
I appreciate that "created a method in a crud controller" that I built out the other day could have been done a lot faster, but holy shit the bells and whistles on the version I delivered was 👨🏽🍳👨❤️💋👨
pwnasaurus11@reddit
This is an interesting study, but there's so many variables that are not controlled for:
- how experienced are these developers with the AI tools?
- have they used these tools in the specific codebase before? Did they have the right rulesets set up?
- were the kinds of tasks they were working on effective for AI?
I can tell you with certainty that I am seeing a lot of value out of AI tools in my day to day job. It's not great for everything, and it is truly a new skill you have to learn how and when to use (and when to give up on it).
It helps me write scripts (I don't know bash well), helps me switch between languages way more quickly (autocompleting syntax I forget about), and is fantastic at doing specific refactors.
It's also already outdated — Claude 4 + Claude Code are showing huge leaps in agentic capabilities over Cursor + Claude 3.5.
I have 15+ years of experience, am an L7 in big tech, and I just don't think this is correct.
eat_those_lemons@reddit
Wait they were using cursor and claude 3.5? The coding ability jump from 3.5 to 4 is huge. No wonder they have such poor results
And as someone said context is really what we manage now and that is vastly different than the prompt engineering of before so definitely curious about how many of the developers were using best practices
throwawayskinlessbro@reddit
That isn’t a truly measurable thing. On top of that, you’d need a vast control to truly understand the numbers IF you were to genuinely take a stab at something as intangible as this.
Now, don’t get me wrong. I love to hate AI too - but just not like this.
itCompiledThrsNoBugs@reddit
I think this is an interesting result but the authors point out in the methodology section that they only worked with sixteen developers.
I'll reserve my judgement until more comprehensive studies start coming out.
teerre@reddit
I'm a part of a study group in BigCompanyTM for coming up with new interview methods that take into account llms and it's interesting we often see engineers taking longer when they rely on the llm, even engineers that certainly know exactly what to do in some questions. There's no conclusion yet, but it's clear that there's something between one prompt gets the answer, obviously faster, and something you have to iterate, often a considerably slower
eat_those_lemons@reddit
Is that just findings on small programs for interviews?
(ie I wonder if better prompting for better one shot ability would improve metrics)
Also what level are the problems? Leetcode easy? Hard?
ILikeBubblyWater@reddit
So using AI made them more money
maccodemonkey@reddit
I've been wondering - If senior devs are faster than an AI, but AI prevents people from upleveling their skills to a senior level, are we killing our own efficiency?
Sure, the AI can do something you haven't learned yet. But if you learn it - you could be better than the AI.
This is where the rubber duck approach seems best. The AI is helping you learn something new to up level yourself.
psycho-31@reddit
I didn’t see article mentioning what counts as AI usage(pls correct me if I am wrong). One can: 1. Prompt AI for majority of smaller tasks. For example: create a method that does such and such or add tests for this class that I just added. 2. Have AI enabled and use it as “autocomplete on steroids”
Groove-Theory@reddit
My theory is from something the article mentioned, that AI performs worse in older and legacy codebases.
I think that the anecdotes come from the fact that AI initially reduces cognitive load on developers. And the reduction of the initial cognitive load makes it seem that productivity has increased by gut-feel. Seeing AI get something seemingly correct, especially in a large, anti-pattern riddled codebase, is a huge relief to many. Whereas having to sit down and implement a fix or feature on a brittle codebase would be a perhaps frustruating endeavor.
codemuncher@reddit
Reduced cognitive load could also be thought of as "i dont understand how my code works anymore", which is an interesting way to do engineering.
The headlines make a lot of hash about "tedious code" but for most real engineering tasks, the hard and tedious part isnt actually turning ideas into programming code, but dealing with the fuzziness of the real world, business requirements, and the ever changing nature of such things.
rebuilt@reddit
It could be the case that devs were actually sped up by 20% when generating code but they were slowed down by a much larger margin later when they had to modify, understand, and debug the ai generated code.
elforce001@reddit
This is an interesting one. The main issue I've encountered was that these assistants are addictive. I felt I was going 1000 mph but then, you start slowing down hard too. Then you invest more time trying to be specific, double checking that the answer is still consistent, then next you know, you've spent more time "debugging", etc..., going from what you thought was an easy 2 days work to 1 week fighting the "assistant's" solution.
Now I use them for random things, inspiration, or something very specific that won't let me down the rabbit hole. luckily for me, I learned that lesson early on, hehe.
Financial_Wish_6406@reddit
Depending on the language and framework Copilot autocomplete suggestions go from usually useful to straight up time consumers. Trying to develop in Rust with GTK bindings, every single autocomplete I find I am going back and deleting almost the entire thing or at least majorly modifying it which is at the point where I suspect it takes notably more time than it saves.
DonaldStuck@reddit
Very interesting. I always run around telling people that I think I'm around 20% more efficient using AI tools. But looking at this study I might be wrong.
NotAllWhoWander42@reddit
Is this “devs use AI to write code for them” or “devs use AI to help troubleshoot a bug”? I feel like the troubleshooting/“rubber duck” is about the one good use case for AI atm.
klowny@reddit
That must be why I feel happier using AI for mundane work, because I'm actually doing less work.
Unlucky_Data4569@reddit
It says the study was on repos they were used to maintaining. I find llms most useful when working with code i am not familiar with
GarboMcStevens@reddit
There aren’t really any good, quantitative metrics for developer productivity. This is part of the problem.
Ok-Armadillo-5634@reddit
Lol they used AI to write their research study.
NatoBoram@reddit
Imagine if you had to spend half a day writing a config file before your linter worked properly. Sounds absurd, yet that's the standard workflow for using ESLint, TypeScript, Spotless, Clang and plenty others.
And that's why I made a project template, which solves that problem for me in TypeScript. You may have different tastes, prefer different tools or want to use archaic JS configs, all of which my template doesn't support.
From experience, getting a LLM to work goes like this:
Done!
I dunno on what planet OP lives with that last take.