AI Code is Hollowing Out Open Source, and Maintainers are Looking the Other Way
Posted by yoasif@reddit | linux | View on Reddit | 115 comments
Posted by yoasif@reddit | linux | View on Reddit | 115 comments
Oktokolo@reddit
The weakening of the copyright protection will soon apply to closed source too. AI is getting stronger and will eventually be able to translate binaries into source code written in a high level programming language.
So yes, for a few years, FOSS licenses may become easy to circumvent.
But after that, all licenses become easy to circumvent. Copyright will finally die.
All software will be free open source, no matter whether the author intended that or not.
yoasif@reddit (OP)
Simply not how these tools work.
Lahvuun@reddit
I've watched an agent do a byte-matching decompilation of ≈200 Lua binaries. The whole thing took about a day. Let that sink in.
With a 2010 AAA C++ title the results were much less impressive, but given more resources (like how Anthropic threw 16 agents at a compiler) it could conceivably have done the job in a reasonable amount of time (weeks to months).
This is what is already possible with current top-of-the-line models. If the rate of improvement stays the same, a couple model generations is all that separates us from trivializing decompilation.
Oktokolo@reddit
I used neural networks a decade ago. LLMs didn't exist. Claude Code didn't exist.
I am pretty sure, automatic reversing will be a thing.
I could do it given enough time. And I am just a natural neural network. So I know, that a neural network can do it. Human-brain-sized artificial neural networks are probably still quite some time away. But I expect more advancements in the art of model design. LLMs are not the last step.
transcendtient@reddit
The only risk is to closed source. Open source users won't switch to a new system built on AI in 2 weeks.
Thundechile@reddit
"US copyright office has.." - Open Source !== US laws.
yoasif@reddit (OP)
Which open source licenses in common use are based on other laws?
Thundechile@reddit
Most common open source licenses (MIT, Apache, GPL, BSD) are intentionally jurisdiction-neutral.
yoasif@reddit (OP)
It isn't about the license, is it, but rather if the LLM output is copyrightable?
yoasif@reddit (OP)
Where are they not in effect? The question is simply whether the document does what it purports to do.
shimoheihei2@reddit
To me there's a lot more problems from AI code than just the copyright issue. AI models tend to produce code that is far harder to maintain, because the code is usually longer, solves just one specific problem, isn't reusable easily, and can contain basic security issues that won't get caught if people are lazy (and let's face it, with the amount of vibe coding happening out there, people ARE lazy) and don't review their code.
hex0xX@reddit
Isn't it the case that most vibecoders don't know how to program and write code, which is why they Vibe code? I am asking because I am learning programming for fun and don't want to Vibe code, sorry if it's a stupid question
shimoheihei2@reddit
There's that, but programmers are being pushed to use AI to code as well. There's no arguing that it's faster. So if you can meet your coding quota much faster by vibe coding, your manager is going to want you doing that. The problem is that these skills atrophy. People will know less, and code will be worse and more unmaintainable. But at the end of the day, if that's what companies are pushing, it's hard for employees to say no. They will be replaced by someone who vibe codes.
hex0xX@reddit
That makes sense. But if the quality of the code gets worse with AI, I don't think that all programmers Wilm be replaced, because of cde is unmaintanable isn't it worthless for more complex programs, especially if Bugs must be fixed. I like AI as a tool and value it as such, but try to use it very sparsely. Because of the reasons you stated. I am scared to lose basic human functions like thinking and coming to conclusions by myself.
rien333@reddit
honestly i also think machines sometimes produce code that is not as easily understandable as code written by humans, and i can see that problem becoming bigger and bigger. Why write something in a highlevel language, if you can write sometime in awk that is half the tokens and twice the speed? Or something that uses nested function calls and weird tricks. Easy for a machine, difficult for a human.
Vibe coders do not care about any of this
i860@reddit
Why even have a model produce output in a higher level language at all if it isn’t ultimately designed for humans to maintain? Why not just skip the middleman entirely and have it generate machine or byte code right out of the gate? Obviously I’m being a bit hyperbolic but at the same time I’m not.
Our issue isn’t the ability to write or produce code. The real issue is lack of abstractions and modularity for common patterns in a particular code base and we’re using AI to generate garbage as a way of avoiding doing the hard work.
saltyjohnson@reddit
I can think of two answers:
I would be very surprised if an LLM how we define them today could ever generate functional machine code without at least thinking in terms of a higher level language, and if it has to think in terms of a higher level language, might as well just leave it at that.
(Also I know you're being hyperbolic, but it was an interesting thought)
SheriffBartholomew@reddit
Why not just skip the human. Tell the AI to come up with ideas, create programs, audit themselves, and then sell the programs to other AIs. Humans not required! Then maybe us humans could go back to living semi reasonable lives.
DynoMenace@reddit
You see, you missed the important step, which is to increase shareholder value.
Dangerous-Report8517@reddit
There are a ton of vibe coding enthusiasts who already think that asking Claude to review a project somehow counts as a code audit…
SheriffBartholomew@reddit
The machines would do this anyways without the guard rails to prevent them. I remember in the early days of AI exploration Facebook had to pull the plug on one of their projects because the AIs started talking to each other in code that they had invented. The engineers had no idea what the machines were talking about and pulled the plug out of an abundance of caution, but also just because they couldn't do anything useful with something they understood nothing about.
Original-Active-6982@reddit
I so agree with this viewpoint. Have the AI directly spew out machine op-codes as the program is running. Dynamic branching, predictive analysis, ... perhaps even giving the answer without needing to fire up the old-fashioned CPUs.
rien333@reddit
No but this is what i was getting at. I think some people are really going to move towards this, at least to some extent.
Squalphin@reddit
I think that the idea is that humans just will never write any code at all. Everything shall by done by an AI, and in that case, the quality of the code is not important.
blackcain@reddit
lol - yes, that's what the oligarchs think. But wait till they find out that LLMs can also create competitor code very easily. Security is a big thing - eventually there will be a big blow up on licensing.
Wait till AI goes after the content industry, we'll let the RIAA and MPAA deal with that. They got much more influence with politicians than we do.
SheriffBartholomew@reddit
These oligarchs are missing an enormous piece of the equation that they should intuitively get, since they use the same playbook. They think that they're going to replace all of their employees with AI and bathe in all the excess money. What they seem to fail to realize is that the second, hell, the fucking nanosecond that an industry is 100% reliant on AI, the AI companies are going to raise their rates to that of, or even greater than the cost of the original employees. Then these companies will be fucked. They'll have fired everyone, so nobody will know how to do those jobs anymore. They will have sunk all of their money into this AI setup. And they'll likely be paying some overpriced vibe coder on top of all of that to keep the pieces taped together as it all flies apart. Even if they realize all of this, and they probably do, they'll still pursue it for those few sweet quarters where they blow the top off of their profitability charts. Humanity is fucked.
Dangerous-Report8517@reddit
Also there’s the fact that a model can have subtle malicious training in all sorts of ways, which is an additional risk for companies using AI systems made by competitors (eg small tech startups, even Apple using Gemini for that matter)
blackcain@reddit
But will they get a govt bail out ?!
SheriffBartholomew@reddit
Of course.
spacelama@reddit
Scary that you think this isn't a bad thing. The quality of software out there now is already infinitely worse than software that was written 25 years ago. "So long as it passes these 75 well defined tests that test only trivial aspects of the code, who cares that it fails these 10,000 quality-of-life tests that make our users want to through our software into the sun?".
Initial-Return8802@reddit
But it was like that before AI as well, Javascript was what brought that around
Dangerous-Report8517@reddit
Yeah but the point is that a machine that’s almost as good as bad coders is obviously not going to produce top tier code
Squalphin@reddit
If you would have worked in a professional environment, you would know that AI code is all but good. We are manufacturing vehicles for construction purposes and no way AI gets to touch even a bit of the code. Industry code looks also way different than most open source projects, on which AI has been trained on. I do not want to say that all open source projects are badly written. There are some really exceptional ones out there, but if safety is not a concern, you have lots of leniency. Safety related code is very cumbersome to write, as you always have to handle everything, and I mean everything. Things which I would have never considered, especially when I was still fresh out of university.
AI has still a very long way to go until it has a place in any meaningful product, at least where quality matters. Maybe in 10 years things will look differently, but for now, we are not impressed with its output.
SheriffBartholomew@reddit
It's infinitely more important in that scenario.
I_miss_your_mommy@reddit
Where do you get the impression it is harder to understand than code written by humans? I've observed the opposite in general. While I'm sure there is some really fine human written code, that is not the norm as far as I've seen. Most developers are lazy and write poorly documented, poorly tested code that is focused on just getting it shipped. AI generated code usually does all the things you'd want out of well engineered code.
SheriffBartholomew@reddit
I'd say that in general, AI code is twice as long as the code written by my senior engineers. Yes, AI does leave lots of comments, but 90% of them are worthless. It won't document the incredibly complicated functions it writes, but it will tell you that the variable to get hello world, which is called getHelloWorld, does in fact get hello world.
rien333@reddit
im literally talking to a bot rn and spits out a terrible awk script that, fwiw, didn't work.
AI produces over commented in general, at least in the hand of low skilled devs
sheeproomer@reddit
Ever heard of garbage instructions in, garbage result out?
I_miss_your_mommy@reddit
A good model?
rien333@reddit
yeah blame my prompt
sheeproomer@reddit
Ever heard of these obfuscated contests?
Also you never heard of techniques for hiding, what an application is doing, even if you have full source in your desk.
These things dont need an Al, these are throuhout coding history.
LankySimple9089@reddit
The security angle is what really bugs me. People are getting lazy with code reviews because the AI output looks 'confident.'
I've seen LLMs suggest deprecated C functions or insecure patterns simply because they were prevalent in older training data. If we lose the habit of line-by-line manual review because we're chasing the speed of 'vibe coding,' we’re basically automating the next wave of CVEs. Open source is built on trust and auditing, and AI is making both much harder
ezoe@reddit
Human written code can be equally unmaintainable.
I think as the cost of coding decrease, society will accept breaking source code level backward compatibility more than today.
i860@reddit
So because some developers suck and shouldn’t have access to a computer we should instead fully embrace the worst practices possible since we can do it quickly? Insanely regressive.
PJBonoVox@reddit
I see this "but humans can write bad code too!" argument all over the place. It's exhausting to explain over and over again how that isn't the point.
ezoe@reddit
We developed many programming language syntax, tools and rules to reduce a chance to write bad code. Some of these inventions are purely for human. AI coding is not human so their characteristic is not the same with human. In some case, they are better than human, in other cases, worse than human.
Currently, we're at the beginning of new era, just like the invention of container(physical) changed transportation of goods forever, AI coding also changes the coding forever.
It doesn't matter you like it or not.
SheriffBartholomew@reddit
It's a standard debate tactic used by children.
ezoe@reddit
If you think only the best of the best are worthy to touch a computer, we never have a current level of software development, we were still accessing computer at university or large corporation, or send source code, or punched card to data center and let it process it, return the result many days later.
Any method to ease the access of computer will be embraced whether you like it or not. Personally, I still hate smartphone. It's a hideously restricted non-free computer we should avoid using it. But most people use it anyway and software development was accelerated because of it. If enough resources are poured to it, there will be some development you don't like.
AI coding will accelerate software development in the same way and you won't like some of the result.
SheriffBartholomew@reddit
And you think that's a good thing?
ezoe@reddit
It's not good or bad. It's inevitable.
Kobymaru376@reddit
People could do that before too if they were lazy. On the other hand, if you're not lazy and actually have a clue of what you're doing, you can tell the AI to be more concise, keep it more generic and reusable and help you to find potential security issues.
SheriffBartholomew@reddit
By the time I've finally managed to get AI to do everything it needs to do, I could have done the task two or three times. Yeah yeah, update my rules files and teach it to be better. Why TF would I want to make the thing that employers hope will replace on people on the planet better?
jasaldivara@reddit
If you're not lazy, yo can do the code yourself and get better results.
SheriffBartholomew@reddit
At this point we have people using AI to review and post comments on code that was written by AI, and it all fucking sucks!
i860@reddit
We’ve basically created weapons of mass destruction with the double whammy effect of rotting developers brains while creating unmaintainable hard to verify code.
It’s like having offshore developers on steroids.
inn0cent-bystander@reddit
The digital version of idiocracy.
blackcain@reddit
It doesn't deal with corner cases either. There are a lot of things to fix that is like negotiating a labyrinth.
sheeproomer@reddit
You really never did a serious software project, regardless if it is tool assisted or not.
ConnaitLesRisques@reddit
And using AI has the same problem as assisted-driving. People stop paying attention when the machine is correct 90% of the time.
donut4ever21@reddit
I've built an entire fully functional audiobooks/navidrome player for personal use and never shared it with anyone, and I can tell you that the code the AI puts out is unnecessarily long. For some reason, it always takes the longer route. I've often found so much unnecessary code and told it to remove it and do it a certain way to code less. I like AI, but for personal use where work is never shared or shared but has no bad consequences on others, but when it comes to public code that people rely on, absolutely not. At least not for another 10 years.
Dangerous-Report8517@reddit
It will tend to produce highly verbose code for a few reasons: - the models are generally trained and prompted to be highly verbose - a lot of the training data is educational material that prioritises things like ease of understanding over efficiency - another big part of the training data is hobbyist projects on GitHub that aren’t skilfully optimised
donut4ever21@reddit
That makes sense. Thank you
vilejor@reddit
It's not uncopyrightable because you cannot quantify what is and isn't AI. The second a human makes any notable changes, it's no longer just an AI output.
I wish people would use their heads and be able to distinguish thoughtful articles from blatant mindless AI slander that does not actually help any anti-ai movement, but makes them seem irrational.
ABotelho23@reddit
Parents are responsible for their toddlers. The people instructing AI models to perform tasks should be too.
vilejor@reddit
They are.
This is a needless statement.
Dangerous-Report8517@reddit
Why the hostile response? They’re agreeing with you and expanding on your original comment
iKnitYogurt@reddit
That's the "AI is a tool" view, and it's a no-brainer. But there's plenty of people who already try to, or strive to, deploy AI as completely independent agents. As in: it monitors software, sees issues, makes changes, opens a PR - all without a human ever laying eyes on it, or explicitly instructing it.
I'm very much a proponent of the usage as a tool, and like any tool, the output depends on the human operating it.
The second case is something I'm not sure how I feel about, very generally speaking. What's clear is however that the models and agent harnesses are not nearly where we would need them to be for this to be an actual option.
dparks71@reddit
I work in a highly regulated industry with licensed engineers. The number of people that act like AI changed anything regarding ethics, liability or accountability is legitimately concerning. If it came from your email, your license is on the line, absolutely nothing has changed. They literally forced me to write policy documents reflecting that.
AshrakTeriel@reddit
You just have to piss off any of the Big Tech-Companies with AI generated code and they will backpaddle immediatly.
yoasif@reddit (OP)
That assumes that changes are being made. We know that people are using coding LLMs as slot machines - pull the handle and see if it solves your problem. Where is the human making any notable changes?
vilejor@reddit
You're trying to make the argument that it shouldn't be copyrightable to a person that believes copyright shouldnt exist.
yoasif@reddit (OP)
No...
vilejor@reddit
Either way, you don't have to presume. The reality is that it truly doesn't matter.
LvS@reddit
And of course this doesn't apply to GPL code anyway:
If 5% of the project was written by a human under the GPL and the rest is AI, then the only way to distribute that code is under the GPL.
And it doesn't apply to BSD either:
If 5% of the code is BSD then you can do with it what you want as long as you add the "contains BSD code" disclaimer and with the AI code you can do what you want anyway.
Poromenos@reddit
Yeah, this is basically it. I don't care about copyrighting the code the AI writes, I didn't spend much time on it. I do care about copyrighting the decisions I made, decisions which led to the software being what it is, instead of something else. That wasn't the AI, that was me.
yoasif@reddit (OP)
https://www.quippd.com/writing/2026/04/08/ai-code-is-hollowing-out-open-source-and-maintainers-are-looking-the-other-way.html#:~:text=output%2E-,Prompts,output
mistermeeble@reddit
The CAI report actually made a significant distinction between wholly AI generated output and generated output arranged or modified by a human to achieve a specific creative objective.
In other words, Vibe Coders are out of luck, but use of LLM tools or generated code is not inherently a poison pill as long as the human at the wheel is actually driving - which anyone using LLM tools should be doing already, because even the best LLM's still make lots of really dumb mistakes.
That isn't an endorsement of the big tech models; Due to the opacity and questionable sourcing of their training data, there exists an entirely separate liability issue for code generated from their models.
Dangerous-Report8517@reddit
That implies that vibe coded patches are safe too since they’re being incorporated into a larger project with significant human input. A standalone vibe coded project also would at least not inherently violate someone else’s copyright based on that, it just wouldn’t be explicitly protected by copyright from others
This is true but only in rare events where an overfitted model reproduces copyrighted or otherwise protected material (eg that classic example of a diffusion model that could be promoted to put Getty’s watermark on images - the watermark itself was infringing regardless of whether the images themselves were). The mere fact that the model was trained on copyrighted works doesn’t actually violate copyright, amazingly even if the works were acquired through infringing means, such as Facebook literally pirating a ton of books for training and still being in the clear of copyright infringement. It’s unethical on the part of the company selling access to the model, but it isn’t usually infringement.
pfmiller0@reddit
Another issue I haven't really heard much about is LLM code theft. An AI gets trained on some GPL code and then it can go ahead and reproduce the code for some future prompt with no attribution or acknowledgement of the original code's restrictions.
PsyOmega@reddit
This has the same problem as students.
A student is often trained on existing code. Did they, steal it, if they take their new-found coding knowledge and create new code?
Human artists are trained on existing art, often beginning their learning by copying it, replicating it, and modifying it. Was the art stolen?
An LLM is much the same. It is trained on existing works, it learns, and then ditches the source training data.
No actual GPL code exists in AI weight models.
yoasif@reddit (OP)
Please.
PsyOmega@reddit
It doesn't actually contain it though. it just has statistical weights that can recreate it from memory, in the same way I can remember and sing lyrics.
astonished_lasagna@reddit
Okay so if I take a picture of a copyrighted text, and then recreate it using OCR and print that, that's fine, because there was a point in between where the work didn't exists as a verbatim copy? That's just nonsense.
Dangerous-Report8517@reddit
No, because the copyright is held on the text, not the ink pattern. There’s no spot in the model where there’s a direct representation in any form of the training data, an overtrained model can recreate stuff that occasionally matches copyrighted work but that’s closer to a student memorising a function they saw and recreating it mostly the same elsewhere and that doesn’t make all outputs from all models copyright infringing.
Having said that, I agree with the sentiment that AI training is exploitative in that massive tech companies are indirectly making a ton of money from the free efforts of millions of humans, but it’s not strictly speaking copyright infringement, in the case of individual people using open weight models for non commercial work I wouldn’t even consider that specific case unethical either.
yoasif@reddit (OP)
That just means it is compressed. What is "memory"?
Upset_Teaching_9926@reddit
AI code needs maintainer review to avoid hollow OSS.
Base44 generates full apps for quick prototypes
PlainBread@reddit
If they don't practice more editorial oversight then it just means they're going to have more regressions to fix.
Apprehensive_Milk520@reddit
AI is the darkness and the light - and still being in its infancy, so to speak, no one has gotten a handle on how to, well, handle it. AI is a godsend and is evil all rolled into one - and not so much in and of itself, but rather it is what people do with AI that's rather concerning. And there are no laws governing AI, that I know of, anyway. I have noticed an exponential growth in the volume of disinformation out there in recent years - about everything. It's really rather sad. What's more sad is that most people can't tell slop from reality. It's not their fault, perhaps. They just don't know any better, given all the info they have consumed during the course of their digital lives...
PlainBread@reddit
AI is an extension of the mind.
Just as the mind is a wonderful slave but a terrible master, so is AI.
But if you aren't on top of your relationship with your own mind first, AI will absolutely take control of you.
SheriffBartholomew@reddit
I thought your comment was pretty insightful, even though it's for some reason unpopular.
PlainBread@reddit
Can't short a cult.
Overlord0994@reddit
What a useless comment
PlainBread@reddit
I can see your mind is your master.
SheriffBartholomew@reddit
This comment reads like it was written by AI
MatchingTurret@reddit
There is something to regulate? I wonder who loves regulating stuff..
Commercial_Spray4279@reddit
I love that my government at least cares a little bit about the people.
Schlonzig@reddit
But do you want to write code or go through reviewing dozens of worthless AI submissions?
EarlMarshal@reddit
Just don't? Everybody maintainer is free to decline/ignore PRs & Issues.
PlainBread@reddit
At some point you gotta start banning people based on the value of their contributions.
Maybe people will eventually realize that having an AI model doesn't make them qualified to contribute.
Ginden@reddit
I would suggest not to take such advice from people who are not copyright lawyers.
US Copyright Office issued guidance that some applications of generative AI may be uncopyrightable. Courts are not legally bound to adopt the office's interpretations of the Copyright Act.
yoasif@reddit (OP)
Out of curiosity, which applications are?
SheriffBartholomew@reddit
Probably the ones that have an army of lawyers.
Ginden@reddit
If you ask which applications are - I don't know, and I think no one in the world knows yet.
If you ask what US Copyright Office thinks:
yoasif@reddit (OP)
Thanks for the reference. Not a very strong argument on the other side, but interesting nevertheless.
Apprehensive-Pay8086@reddit
If you're billion dollar corporation, it's fine. If you're individual, it's illegal. Same as most laws.
yoasif@reddit (OP)
😝
Capable-Average4429@reddit
Maybe part of the problem is that there is a lot of people writing thousands upon thousands of words about the issue, and not a whole lot of people helping the maintainers in any way shape or form.
yoasif@reddit (OP)
We're robbing Peter to pay Paul.
MelioraXI@reddit
but "I built" gives me karma! /s
chmod_7d20@reddit
Look at an old "I built" post and you'll see it hasn't gotten any new features since the original post.
global-gauge-field@reddit
The part of problem about these personal "projects" is their end goal. I posted only a few projects I did on reddit, all of which was something I needed to use and cared for. So, I was already dog-fooding myself with the product before submitting to any social media.
When it comes to these promotion posts, they are nothing like an organic software development process where the original author creates a certain piece of software to solve problem for themselves first (and then make it available to others). If you combine this with vibe-coding, you become like a intermediary between your alpha users and coding agent, which seems like really weird and inorganic process. The only reasonable scenario where this makes sense is if you want to sell online courses etc at the end.
nicman24@reddit
lol maintainers are using ai
MatchingTurret@reddit
Old man yelling at clouds (pun intended). It's happening and it won't go away.
billyalt@reddit
This is like celebrating that we're building homes out of cardboard instead of brick.
MatchingTurret@reddit
I'm not celebrating.