If the Rust Coreutils can use the MIT license, does that mean that any open-source project can be rewritten with a different license?
Posted by OrwellianDenigrate@reddit | linux | View on Reddit | 181 comments
I didn't know rewriting code was enough to allow you to change the license, but that seems to be the case for the coreutils. I understand there is more to it than just rewriting the code, and you need to be able to prove you didn't copy the existing code.
With how AI is progressing, having a team of developers rewriting code could become less of an obstacle.
I don't think anyone is just going to rewrite the Linux kernel, but it does seem as if it could become a problem for smaller projects, where a bad-faith actor wants to use the code with a different license.
C0rn3j@reddit
You can create any piece of software, open source or otherwise, under any license you wish.
If you create it from scratch, that is.
roboj3rk@reddit
Aren't the GNU Tools just re-implementations of proprietary UNIX Utilities?
If it was okay to re-implement Unix Util to GPL firendly GNU Util why cant someone re-implement GNU Tools to a MIT license, assuming done properly?
Regular-Impression-6@reddit
They were clean room re implementation, and not very faithful ones at that.
The original BSD OSS Unix tools from Berkeley, still exist, too.
dougmc@reddit
They were certainly re-implementations, but "clean room" implies a level of care that really wasn't there.
"Clean room" design typically has one team look at the original product and write a specification, and then another team looks at the specification and reimplements the thing, where here the team that made the GNU tools was working with the original utilities directly. Presumably they didn't have access to the original source code (though I wouldn't be surprised if some contributors did have that access), but they were definitely working directly with the original product.
It's possible that a few tried to do the clean-room approach, but that definitely wasn't the norm.
BitOBear@reddit
They were rewritten based on the manual pages and FreeBSD. Which is clean enough.
I was a system 5 release 3 and system 5 release 5 license fee back in the 80s. And a lot of the code in unix, and windows, and Mac OS all came from FreeBSD to begin with.
The room didn't have to be clean because the free BSD license was take and do with it as you will.
And what they chose to do with it is create the gnu project.
It is well known to be clean enough because it has been challenged in court repeatedly all the way up through the Unix wars that involved Santa Cruz operation AKA SCO and their subsequent purchase of Unix from at&t.
gplusplus314@reddit
I never knew of a link between Windows and FreeBSD. Do you have a link for more info,
BitOBear@reddit
Separately, the original Unix system architectures were turned into an operating system called CPM, and then CPM became the basis of MS-DOS, and MS-DOS underpinned all windows.
Up until people decided that software was a true commodity it cross bread and cross-influenced just so hard.
Basically the Advent of Microsoft and their competition with Apple and so collectively they're aggressive stance to closing their own technologies, inspired a collapse in the software development ecosystem.
Entire free software Foundation was started to stem that collapse. And the entire point of the GPL was that corporations were harvesting whole scale out of the various BSD license softwares and returning nothing to the common pool so Richard stallman at Al developed the license that required contribution to the pool from which they were plundering.
To this day the phone in your own hand still uses the AT command set internally that was originally developed by Hayes in the commercial creation of the commodity telephone modem.
Commercialization made a huge mess out of what could have been a much more powerful technology by now.
Everything used to interoperate but that level playing Field wasn't good for the would be takers and their desire to become the billionaires we live with today.
gplusplus314@reddit
Where did you find this information?
The basis of MS-DOS was Seattle Computer’s QDOS (Quick and Dirty Operating System). Literally zero UNIX or CPM lineage whatsoever.
BitOBear@reddit
If you would like to read more, I suggest Google searching "how much of CPM ended up in MS-DOS."
It's known that a good bit of cpms internal disassembler was used to create the foundation of the other operating system.
BitOBear@reddit
I lived through it.
In the case of Unix and CPM it wasn't necessarily literal copying.
But the internal flow was identical.
File control blocks.
Specifying the device name and a colon before a particular designator.
The use of the 0x80 interrupt to Call into the operating system (on specific hardware).
It all came from the same books, design philosophies, and shared snippets of information that were being gargled enough chucked in places like Berkeley.
BitOBear@reddit
I don't see my other reply...
Go ahead and Google "freeBSD elements found in Windows and Mac"
BitOBear@reddit
There's No One source, but if you Google "FreeBSD elements in Windows and Mac" you will find plenty to read.
Slackeee_@reddit
This. If you want to see a true clean room implementation look at ReactOS, where developers have to sign a statement that they never had access to any Windows code before they are allowed to contribute.
anikom15@reddit
Clean room isn’t actually necessary for avoiding copyright violations. It just guarantees it.
dougmc@reddit
Yes, that's my point -- "clean room" implies a level of care they didn't need.
And the list of contributors to GNU coreutils is huge -- hell, I'm even on their list (though I don't think I've contributed any actual code, just bug reports and the like). Hundreds of people, and if any of these people directly worked with the utilities that coreutils replaces then that violates the clean room design philosophy right there -- and many of the people on the list most certainly did work with them.
anikom15@reddit
I’m not too worried about it. It will inevitably turn into a giant mess of spaghetti code that nobody will want to maintain (not unlike coreutils) while GNU quietly keeps trucking along.
Regular-Impression-6@reddit
Yeah, 'clean' was generous.
It was more like a messy garage or kitchen table, where sw dna merged and mixed. But to call them copies, is wrong too. Kinda like Camaro and Mustang and Challenger are all muscle cars, and if you can drive one, you can drive any. Gnu folks took every opportunity to "enhance" them. And with the exception of a few, like gcc, gawk, bash, they kept the same names.
I'm up to my ears just now on a problem rooted in the fact that gnu sed isn't Unix sed .
I don't think there's a copyright issue .
Trademark issue, now, that's a question 😉
Bleech!
baby_shoGGoth_zsgg@reddit
https://www.gnu.org/prep/maintain/maintain.pdf
this document goes over tricks to avoid copyright issues when you’d seen the original sources by doing things like optimizing for memory usage when the original at&t sources optimized for cpu, etc.
if you’re not basing things on the source, you don’t have an issue anyway, but gnu themselves weren’t even that strict about it, and offered explicit guidelines for skirting copyright while reimplementing proprietary tools you’d seen the source of.
RenderedKnave@reddit
initially the point was to implement AT&T unix tools without using any AT&T code, which both the GNU project (and, much later) the BSD project aimed to do. They both succeeded
MatchingTurret@reddit
Yes and no. They are specified here: The Open Group Base Specifications Issue 8 IEEE Std 1003.1™-2024 Edition (under Shell & Utilities).
This is an open standard that formalizes the proprietary tools originally developed for proprietary Unix.
Itchy_Journalist_175@reddit
I think that this is the point he is making. Hard to prove that these rewrite weren’t inspired by the original ones in a “clean room” environment when the code has been out in the open for decades. If I was R. Stallman I’d be pissed.
lestofante@reddit
They are written in a completely different language, that is enough in my opinion
mrtruthiness@reddit
It's a good thing your opinion doesn't mean much in terms of copyright law.
For example: If someone were to create a translation of Hemmingway's "The Old Man and The Sea" into a different language, it would be a derived work with ownership from both Hemmingway and the person who created the translation. It could not be distributed without a (compatible) license from both.
lestofante@reddit
But with sofware is never a 1:1 translation, there is always some reinterpretation..
But yeah, i guess the line is much more thin that i though, looking at people's reaction
mrtruthiness@reddit
Language is never a 1:1 translation. There is always a difference in phrasing and word choice that can convey a different sentiment. The best literary translators are golden and are artists in their own right.
suid@reddit
What do you mean "inspired"? If I write a new C compiler, is that "inspired" by GCC because it works "the same way" (i.e. compiles the same language)? That's nonsense. This isn't art school.
Programs are written to a spec. The spec is open. Anyone can write any program that does the same thing, as long as they don't copy large blocks of code from an existing program to do it.
CaydendW@reddit
Well, the C standard is an open spec, and anyone can implement it. This is more akin to looking inside GCC and copying how it does its codegen or its parser and calling it your own. Clang is its own compiler that still compiles the C programming language, but in a completely different way. BSD are their own coreutils written in their own way.
What I think the person you're responding to is insinuating is that they might have not merely followed a spec and implemented it, but rather looked at the code available from coreutils and transcribed it, which is not really the same thing. The latter is subject to GPL. Following a freely available specification of functionality, as you said, is not.
Whether this is true or not, I cant say. I'd hypothesise that the rust coreutils maintainers understand they're running a large project and will implement it in a clean room style where they only read man pages and develop to a spec instead of copying code, but I can't confirm this.
Also, one could maybe argue that the GNU coreutils standard is not freely able to be copied if the copyright is held, but I dont think this works. Especially due to the pages being licensed how they are. Someone with more legal knowledge could comment on this with a far more informed opinion than I can.
zoharel@reddit
Sure, but there's no reason to assume they did this. Coreytils after exactly the most complicated pieces of software in the world, after all. It might be as easy to reinvent them as to read through the old code.
CaydendW@reddit
I agree with you. I'm just trying to clarify what I though the commenter was saying. I really much doubt they're stealing code.
Also, the coreutils can be oddly complicated in some odd situations. Stuff like yes(1)'s throughput optimizations and GNU's extensions to awk. Not crazy stuff, but not as simple as say, sbase for example.
That being said, I still dont think the rust coreutils writers are stealing code. I reckon they're more than capable of doing it themselves. Was just trying to clarify what I thought someone meant.
zoharel@reddit
I can't imagine anything yes is doing to be that bad. Awk is clearly a bit more of a mess, sure. There are also a great number of implementations, which both complicates things and might make it even less likely that you get a full-on copy of one in particular.
Anyway, it's very strange to me that some people are concerned that this may have happened. Clearly, if there were some indication that it did happen, it would be appropriate to hammer that out, and of course it would have been inappropriate to switch licenses. "The original code is public" is hardly a reasonable argument for why it might have happened, and if people are just concerned about idealistic purity with respect to free software, well, they should not be. The risk of a set of core utilities being balkanized by some commercial effort is as close to zero as one can get, and that's literally the only protection something like the GPL would offer. Fundamentally, even Richard Stallman, of all people, seems to have few complaints about BSD-style licenses, though he doesn't use them himself.
That got away from me a bit, but: There's no indication anyone has done anything improper, and the license is fine.
CaydendW@reddit
About this yes thing, you might be a little surprised: https://www.reddit.com/r/unix/s/jGGjtrfCEu. It's not exactly rocket science, but it is interesting nonetheless.
Personally, I dislike large projects with MIT or BSD licenses, but that is an ideological disagreement from me and not a moral one.
zoharel@reddit
Oh, I'm definitely surprised that any optimizations are happening at all.
CaydendW@reddit
GNU know their way around a C compiler ig. It's such a needless optimisation; I love it.
zoharel@reddit
Looking over the summary, I mean, it's exactly what you'd expect if you were optimizing this in particular. ... which, yes, still seems entirely unnecessary. Clearly somebody got really, really bored one evening, or was using yes for something extremely off-label, or both.
ronchaine@reddit
GNU Coreutils aren't GNU invention, they are larelgely a GNU reimplementation with open licence that AT&T UNIX didn't have. BSD systems have their own implementations of much of the same stuff. Good bunch of all of that is standardised under POSIX, which is handled by ISO/IEC JTC1 SC22, which is the same standards committee under which programming languages such as Ada, C, C++ and FORTRAN are standardised.
CaydendW@reddit
True, but the GNU coreutils have a lot of extensions that are GNU inventions, which is what I was eluding to. Stuff like the various ls options, grep extensions, etc. Afaik the rust coreutils project is attempting to replicate these as well as the standardised POSIX options available. I think it's fair that we can consider those to be GNU inventions.
The_Real_Grand_Nagus@reddit
*alluding
pdxbuckets@reddit
Inventions are not covered by copyright. That’s what patents are for. For example, food recipes are not subject to copyright (other than the narrative blurbs), which is why you can get NYT recipes from all the recipe aggregators on the internet.
Hunter_Holding@reddit
Inventions, sure, but doing ls --help and then exercising the various options to see the output then making something that does the same is perfectly fine to put under a different license.
The trickery would come if they were just straight converting the code, so you'd see the same logic, function names, etc. But if they're 'common enough' IE you'd expect someone who's never done anything related to this before to name something that way writing it in a black box without having seen the other source code, then it's also probably safe.
CaydendW@reddit
Exactly my point. If they're not copying GNU code and algorithms, that's good for them. I think the insinuation was that they might be copying and subsequently relicensing. Perhaps I misread the post, but what you said was more or less my point.
Hunter_Holding@reddit
"Copying" in this case could also include, and be a copyright infringement/prosecutable, if they had the source up in one window, and line by line rewrote it into another language.
The way I described it would be effectively clean-room / functional spec reimplementation.
My comment was basically "if they're doing it clean-room style, then it's fine"
CaydendW@reddit
I agree with this. I think we're vehemently agreeing
Far_Calligrapher1334@reddit
IANAL, so I'm not gonna make a statement here, but there is plenty of software that reimplements existing copyrighted code legally (afaik, at least they aren't prosecuted anyway) and people mostly cheer for it, why would this be different?
randuse@reddit
Clean room is not a requirement. It is a precaution against companies like oracle who have more lawyers than engineers.
dnu-pdjdjdidndjs@reddit
how does nobody get this
Moscato359@reddit
Inspired by is very different than copying code
Brainwormed@reddit
You do know that most open-source tools are re-implementations of older, closed-source ones, right? GNU (as in GNU/Linux) spent most of the 1980s and 1990s rewriting closed-source or BSD-licensed utilities -- including an entire compiler -- under open-source licenses.
That is literally why Gnu's Not Unix.
jonathancast@reddit
Technically, the BSD license only goes back to 1989. Most of the BSD utilities that were rewritten for GNU were still proprietary - even in the BSD version - when the GNU version was written. It doesn't fit the purpose of the GNU project to rewrite free programs, since the goal is to produce a free operating system, not to write one from scratch. They typically just adopt free programs when they can.
The only exception I can think of is GRUB, which was started well after LILO was already production-ready - for GNU/Linux. I think the reason, though, was that LILO couldn't boot HURD, so they needed a new bootloader. HURD itself was started before Linux existed, of course, when the BSD kernel was still under AT&T and University of California proprietary copyright.
Of course, the other exception is GNOME. I believe the facts are these:
At the time GNOME was started, Qt was under a nonfree (no modification) license. KDE was therefore non-free by virtue of depending on a non-free library. (The non-free Motif toolkit had been used by free programs before, such as GNU Emacs, but as an option not an absolute requirement.)
Qt's first free license was the QPL, which is incompatible with the GPL. (If you distribute a modified version of the toolkit, or any program using it, you are obligated to send Trolltech a courtesy copy. This also makes the license not Debian free, and probably makes it not open source even though the OSI approved it.) This made KDE programs non-free unless they had a linking exception, and was an inconvenience in general.
Qt has been GPL-licensed (among other things) since 2000, which means KDE has been unambiguously free software since then; but, by that point, GNOME was already well-established.
Brainwormed@reddit
The only exception you can think of is GRUB?
Like, Richard Stallman wrote the first versions of EMACS and GCC himself. Coreutils, GlibC, and Bash are also all kind of a big deal.
jonathancast@reddit
And which of those programs had free versions before he wrote them?
I know he wrote GCC specifically because he couldn't find a free C compiler.
Again, there was no free Unix coreutils when the GNU project started. In 1984, BSD was still proprietary software.
Bash also goes back to about 1989, and is a clone of ksh, not actually the Bourne shell. Both of which were proprietary at the time anyway.
Brainwormed@reddit
I never said anything about rewriting free utilities. GNU rewrote proprietary utilities -- I used the term "closed-source," which is generally the same thing excepting corner cases. BSD is different from proprietary but also non-free (whether it's a better license than e.g. the GPL is a separate issue).
jonathancast@reddit
You said "closed source or BSD-licensed". "BSD license" universally means one of the free versions, starting with the University of California's four clause license from 1989. Which is and always has been considered a free license, but wasn't used for any software until 5 years after the GNU project started.
NotQuiteLoona@reddit
Clean-room rewriting? Can use any license it wants.
atred@reddit
"Clean room" is not even a legal requirement, it's a precaution that people took in some cases "see, I wrote that without even reading the code".
mallardtheduck@reddit
It's not written in statute, but it is the only process that has been tested and established as legal in court. You're free to try other processes if you're willing to accept the risk.
torsten_dev@reddit
AI rewrite is a new option. I'm be y curious to see courts tackle that one.
semi-@reddit
AI output is generally not considered copyrightable, which would be a problem with trying to apply any license to it
linmanfu@reddit
...by the AI. That's a critical caveat. All the cases I'm aware of concern a guy called Stephen Thaler who tried to claim patents and copyrights with his AI as the author. The UK and US Supreme Courts have both rejected that. But I haven't seen any cases where courts have set precedents about whether works produced by LLMs can be copyrighted by the person operating the LLM or the person(s) who created the works that they are derivatives of.
The Thaler cases are consistent with the 20th century cases on photocopiers. There were attempts to claim that the photocopier manufacturers were responsible for copyright infringement with the machines' output, and the courts threw those out too, saying that the human operators held the responsibility. What you put into the machine determined what you got out of machine. So we might find that LLM outputs are copyrightable, perhaps by the prompt writer, perhaps by whoever trained the LLM, or perhaps by some chimeric combination of the two.
nelmaloc@reddit
Zarya of the Dawn, although not a court ruling, says that prompts can fall under copyright, but their output doesn't.
vyPal@reddit
An issue with that is that, if I'm not mistaken, you can't copyright or otherwise license code generated by AI, I remember this being the decision of some court in the US
LousyMeatStew@reddit
I don’t think they need to.
AI is just a black box. The whole point is that you give it the source code you want to copy and you get back a rewritten version. It doesn’t actually matter if the black box is an LLM or anonymous freelancers.
I think what people are missing here is that they focus too much on the output process - “the magic black box gives me new code” rather than the input process - “I give it the source code I want copied”.
That latter part is what I suspect will be the deciding factor if this ever goes to court. If you put source code you don’t own into a black box with the explicit goal of creating a functionally identical replacement just so you can put a different license on it, it doesn’t matter what’s inside the black box.
linmanfu@reddit
Yes, and it seems similar to the photocopier cases from the 20th century. If a photocopier switched pages at random, does that mean you've created a new book, or have you still infringed on the copyright of the one you put into the machine? The answer to that is pretty obvious. But where it gets tricky is going to be assessing the relative contribution of the person who trained the LLM and the prompt writer.
LousyMeatStew@reddit
No, these would need to be handled as separate cases. If you want to enforce your copyright against a project that stole your code and refactored it with AI, you can only go after the project. Sony v Universal basically says that the makers of the tool used for infringement are not liable for infringing uses of said tool.
The exception is if you want to argue that the tool lacks substantial non-infringing uses. Then you need to go after the tool maker, ie the ones who trained the LLM. This was the basis for A&M v Napster.
As far as I'm aware, there’s no precedent for liability being shared.
mallardtheduck@reddit
If the same LLM is fed the original code and outputs the "rewrite" I'd imagine it'll be treated the same as any other software tool that operates like that (e.g. "transpilers") and the output would be considered a likely-infringing "derivative work" of the original.
If one (instance of an) LLM is fed the original code, produces human-readable documentation and then another (instance of an) LLM takes the documentation and produces new code, well that's the clean-room process implemented with LLMs instead of humans and would likely be treated the same way.
Of course, there's always room for courts to issue "surprising" rulings, so don't take anything I said as fact, it's simply a layman's opinion.
wintrmt3@reddit
The problem is all the LLMs training data 100% includes that source already.
LousyMeatStew@reddit
No, this is a mischaracterization. The use of clean room design is supplied as evidence of non-infringement. It's use as evidence has been tested but this is a question of relevance, not of legality. No ruling I'm aware of directly ties a finding of non-infringement explicitly to the use of clean room design.
Because its evidence, it can end up being challenged and thrown out - something that happened during Meishe v TikTok, which TikTok eventually settled.
Reverse engineering in general has been considered fair use by the courts since Nintendo v Atari (in that case, Atari used illicit means to obtain code for Nintendo's lockout chip and that's what did them in, not the reverse engineering), which is why the DMCA had to make reverse engineering of copyright protection systems explicitly illegal.
To be clear, "fair use" itself isn't an automatic defense either. You're still subject to the four factors test of Campbell v. Acuff-Rose, but this too has been tested in courts with Google v Oracle being the most notable example.
bigbearandy@reddit
Yeah, clean room code really hasn't been a big thing since people were trying to replicate IBM's API's without access to the source code for commercial alternatives. There isn't much in the market right now that fits that kind of use case, where one dominant player pushes out all competition from a systems engineering perspective.
deviled-tux@reddit
Wine is a clean-room implementation of the Win32 API.
In fact you are not allowed to enable Win32 call tracing in wine if you are using original windows dlls, as the traces will contain a call stack and hence compromise the clean room status of the project
Probably this is the most notable project like this atm
bigbearandy@reddit
That makes perfect sense, but WINE has been around a LONNNNG time.
guri256@reddit
The idea of clean room rewriting is built around the idea of proprietary software, decompiling, and reverse engineering secret things. For example, you might need to decompile MS Word to help you understand the .doc spec, so you can write OpenOffice.
Many of the core Utils can be implemented using the POSIX spec and the man pages. No need for this sort of reverse-engineering since you’re both programming to a spec that already exists. You can then fill in many of those gaps with the man pages.
Also, there’s no requirement that it be sufficiently different. Some utilities, like “false” or “cd” might have such a simple obvious implementation that the two sets of source look almost identical (ignoring language differences), and that’s fine.
Pragmatically, copyright is only really an issue if you are worried about the other side suing you. And I would expect that the GNU foundation would generally try to avoid suing if it’s pretty obvious to a software developer that the two were independently written. Suing software developers for releasing open source rewrites (that don’t use the original source code) is very much opposed to their mission statement
Legitimate_Law8275@reddit
no cap
FlorpCorp@reddit
I'm not a lawyer, but as long as it's a cleanroom rewrite then I believe so yeah.
Mr_Lumbergh@reddit
This is how Compaq (IIRC) opened the door to IBM clones. They had one guy explaining what was happening in BIOS during the boot process and another that was writing code to reproduce it.
MatchingTurret@reddit
Columbia Data Products
RootHouston@reddit
Didn't CDP's version of the BIOS have compatibility problems though? I believe the reason why Compaq's took off so much more was because it had far greater compatibility.
Mr_Lumbergh@reddit
I didn't remember the company correctly then.
MatchingTurret@reddit
Multiple companies did it, but CDP was the first. Compaq however had the largest impact.
Mr_Lumbergh@reddit
Ah. I knew Compaq was involved somehow, but in my flawed recollection I was thinking after they did it was sort of "out of the bag" and others plowed ahead with what they'd done.
Natural_Night9957@reddit
Let's be clear here: three lether agencies.
Rust is a virus that was engineered to kill FOSS
Business_Reindeer910@reddit
that makes absolutely no sense. I'd really suggest you read about the guy who actually made it and his history in FOSS.
Natural_Night9957@reddit
Please don't imply that a Mozilla Foundation guy (even if it's the creator) back then has all the saying about Rust.
Business_Reindeer910@reddit
so if it's not the language itself ,then what is it?
Natural_Night9957@reddit
Just go back to the OP question.
Business_Reindeer910@reddit
We already know how rewrites are legal in general. There's nothing more to say about that.
Natural_Night9957@reddit
That is nothing more still convenient to corpos boot lickers.
Business_Reindeer910@reddit
LOL.. so you can't backup your claims at all and resort to insults.
Note, i am not defending AI rewrites here, just speaking of what's legal or not.
And you are not even defending your point about rust.
Natural_Night9957@reddit
Is this disingenuousness or naivety? In capitalism everything can be made legal if one has enough money. Rust is the tool chosen for the rewrites: MIT based with a lot of artificial proselitism among a not very politically bright userbase.
Business_Reindeer910@reddit
python was the tool for the rewrites in the past, as was C.
Natural_Night9957@reddit
You're slippery sloping
Business_Reindeer910@reddit
I've said what i had to say, and proven that you don't actually know much about the software ecosystem at all. So that's enough for me. Bye now.
Natural_Night9957@reddit
lmao
Separate-Royal9962@reddit
The AI rewriting angle is the real concern here. With LLM code generation getting better, the effort barrier to "clean room rewrite" is dropping fast. AGPL partially addresses this for network services, but the fundamental question is whether copyright-based licensing can survive when the "cost" of reimplementation approaches zero.
TimChr78@reddit
Legally you can rewrite anything to whatever license you want (including closed source) as long as you don’t use any of the existing code.
If someone wants to prove that they haven’t used any of the code - they might use techniques such as clean room engineering, but it is not a legal requirement.
Choreboy@reddit
What happens if some of the code is the same by virtue of "well this is how you do that function so that chunk of code couldn't be different"?
Business_Reindeer910@reddit
It's been seen as fine, otherwise folks wouldn't be able to implement many of the common published algorithms.
Godzoozles@reddit
In the case of Rust Coreutils they make their progress on the basis of passing the tests written by Gnu Coreutils.
Contrast that with something like SQLite where the tests are not FOSS. You will not create a competitor to SQLite because you cannot use their test setup against them to validate your own implementation.
My proselytizing: Anyway, I hate the Rust Coreutils because of the license. MIT-licensed code is the Trojan Horse gift to proprietary software.
Business_Reindeer910@reddit
no, because we've had tons of BSD licensed implementations FOR YEARS.
Hot-Employ-3399@reddit
No, only closed source project. /s
What kind of question is it? Have you tried to think for one minute?
piesou@reddit
If you rewrite code using AI, you can't copyright the resulting code so picking a different license is pointless.
nandru@reddit
genuine question: why?
piesou@reddit
Why what?
nandru@reddit
why you cant copyright code rewritten by ai
piesou@reddit
The same reason because a photo made by a monkey can't be copyrighted: it needs to be created by a human author. There's precedent in the Thaler vs. Perlmutter case but IANAL.
nandru@reddit
Oh, I see.. thanks for the explanation!!
dnu-pdjdjdidndjs@reddit
copyright office speaks I listen on fonem
RoomyRoots@reddit
AI is a legal nightmare, but, technically, yes.
I don't know or care for the rust coreutils, but I believe they were written from the ground up but sharing the API, so that alone is not an issue.If nothing we wouldn't have Linux as it is nowadays. Kinda like Wine is not Windows.
MatchingTurret@reddit
They share the commandline options, but that's usually not considered an API.
RoomyRoots@reddit
From the POV of the shell, it's the same. It's main interface is being executed like that.
Business_Reindeer910@reddit
but it's not the same though. It's completely legal to call a GPL program via the command line even from a proprietary prgram. It wouldn't be legal to link to the library if it had one.
braaaaaaainworms@reddit
API ≠ ABI ≠ linking to a library
Business_Reindeer910@reddit
the ABI distinction doesn't matter.
RoomyRoots@reddit
And? It is still an Interface and I didn't imply that an API would violate the law, I even use Wine as an example.
MatchingTurret@reddit
That's why I hedged a bit with "usually".
OneTurnMore@reddit
I honestly can't think of any examples of command line options not being considered an API.
loonyphoenix@reddit
Nope, that's exactly what an API is. The documented way that you interact with an application programmatically (that is, when you're calling it from other programs or scripts) is its Application Programming Interface. Command-line parameters qualify. Today's narrow interpretation that an API must be a Web API of some sort is just that, narrow.
MatchingTurret@reddit
As I wrote earlier that I'm aware of this and that this is why I hedged with "usually".
esanchma@reddit
There is an emerging issue if the cost of clean-room reimplementations becomes marginal or effectively zero. One agent can reverse-engineer a system and produce a specification, and another can reimplement it from that spec with no direct code reuse.
In that scenario, there is a real strategic shift: if reimplementation is cheap enough, the coercive power of copyleft weakens because avoiding the license becomes viable. At that point, the question is whether these licenses remain relevant as a mechanism when "just rewrite it" becomes the easier path.
Ginden@reddit
This weakens coercive power of copyright too: I can rewrite proprietary software. And within year or two doing this will be possible for open-weights model running on hardware arguably cheaper (in relative terms) than 1989 PCs (when GPL was created).
esanchma@reddit
Absolutely. The differentiating factor may be the willingness to sue, since the process itself is often the punishment.
tav_stuff@reddit
This is literally how we got the BSDs in the first place. UNIX was proprietary but had source code distributed, and the guys at Berkeley rewrote the OS from scratch with an open source license to give us the BSDs
help_send_chocolate@reddit
That's not an accurate summary of the origin of BSD at all.
tav_stuff@reddit
No it's not -- the ORIGIN of BSD is much more complicated -- but the reason that *we* have BSD (unlike a system like Windows which has *you*) is precisely this.
dkarlovi@reddit
The issue is you can now set up a test harness where you allow the coding agent to run the original project's tests without ever reading the original tests, let alone the original source code. This allows you to rewrite basically anything and it's currently unclear if you're violating any license, apparently you're not.
kansetsupanikku@reddit
Realistically, it's hard to ensure that AI is a clean room, as the confirmations stated by the conversational models are often false. If any stage of training involved publicly available sources scrapped from the web, that includes GPL code, and no sensible legal regulations cover it for now. While some countries are lobbying pro-AI in such scenarios, denying that connection seems detached from reality. Unless you can prove that no training sets included copyleft code.
dkarlovi@reddit
Which article of which GPL is being violated in this scenario?
kansetsupanikku@reddit
If the output is licensed as GPL as well: none.
Otherwise: 5c
dkarlovi@reddit
Assuming GPLv3, 5c applies to "Modified Source Versions", which this is obviously not, in this hypothetical the coding agent started with an empty folder and no access to the GPL source code so it couldn't modify it.
kansetsupanikku@reddit
As long as it has "no access to the GPL source code", yes. But that should include access prior to running the agent. Your typical code-aware LLM is not a clean room. And none of the popular ones is trained with the "non-copyleft" handicap. Neural network is a function that maps training data and a prompt to a fuzzy set of outputs. Implicit as it is, it counts as processing (modification) of the training data.
dkarlovi@reddit
Maybe, but does it currently? No.
Does it? Can you link the court decision and which jurisdiction is it?
kansetsupanikku@reddit
If it's any of the top 10 vendors of LLMs that can be adjusted to the code, then yes, it does.
And I live in statutory law, where the lobbying process is, fortunately, longer. And reality matters slightly more. Any academia level handbook that discusses both data processing and neural networks would support this phrasing.
But going by a quick search into the USA madness - GEMA vs OpenAI? Regardless, there will be more to come. GPL violations are famously underreported, and the issue is relatively new, as usefulness of agent systems keeps improving.
dkarlovi@reddit
I'm not seeing any links in your reply, where are the court decisions backing your claims?
Which court decision exactly are you basing this statement on?
kansetsupanikku@reddit
I'm not basing my statement on court decisions, because where I live, court decisions don't establish precedents
My claims are scientific rather than legal, since such is my background. And I would repeat them both when giving a lecture and when called as a court expert. Of course, what to do with such claims, is up to the judge, which, I have to admit - I am not.
dkarlovi@reddit
Licenses are a legal document, not academic. It doesn't matter what academia thinks if it's not backed by anything, it's questionable if academia would even agree here (with my experience with academia people, I'm guessing no).
Are you right in your thinking, should it work like you're describing? Maybe, probably yes. But currently it does not, GPL and other software licenses are, as it currently stands, apparently worthless against this because they didn't anticipate this sort of development.
The key point is: licenses protect a specific instance of the solution, because up until now, that instance was what's valuable due to the prohibitive price of reaching parity. Now that math is upside down because reaching parity (at least superficially) of a codebase is a basic prompting exercise with adequate tooling provided. Nobody can say what's legal and what's not, the fact remains everyone is assuming it is legal unless proven otherwise, which it currently isn't.
atred@reddit
"Clean room" is not a legal requirement, it's a precaution that some people took to say "see, I wrote that without even reading the code"
panick21@reddit
Do people not know that the first Unix utility that were opened up were on BSD. And that GNU utilities are the rewrite?
Why? If somebody is writing something new, how is that a problem? Does the original code vanish?
bigbearandy@reddit
In the old days of OSS, the big software companies had the idea of "embrace and extend," taking over open standards by extending their capabilities so that they could only be served by licensed commercial software. The new open secret is "embrace and smother": just like systemd, create systems that are complicated, tightly coupled, and don't easily allow for improvement under OSS licenses that allow for commercial use. This is why Microsoft won't fund OSS projects that don't allow you to use truly open licenses, like GPL, and instead want licenses that will allow you to make a fork of an existing project and make it a commercial project.
I'm beginning to half wonder if the push to Rust isn't related to such a strategy. I mean, the core utilities are fine the way they are; they've been vetted and debugged for security issues over decades. Rust doesn't really add anything but more limited interoperability, a more restrictive license, and some marginally useful memory protection.
srivasta@reddit
Vibe coding any software send to fill RMS' original vision of being able to share any software with his friends. I mean, that is the origin of the whole free software movement.
We can replace any proprietary software now and share it. (I still use agpl for my software, but that's just because I like the idea of people feeding back enhancements).
MustUnderstandTrains@reddit
Do you know what "GNU" stands for and why?
Desertcow@reddit
Clean room rewrites are fair use. This goes for proprietary software as well, which is why projects like WINE are able to reimplement the way Windows does things despite Windows itself being proprietary
PAJW@reddit
No, they are not. Clean room rewrites totally circumvent the copyright, rather than using the fair use exemption. In the same way that if I wrote a biography about Lady Gaga, my copyright would not depend on any authors who came before me.
mikeymop@reddit
Rust coreutils is only coreutils in name.
Its an entirely different set of applications.
aukkras@reddit
Yeah, free software is dead, nothing is protected anymore - expect more enshittification of linux by corporations.
hollowaykeanho@reddit
I'm not a lawyer and neither am I their project developer.
If I remember correctly, you need minimum 2 independent people to prove an algorithm is easily thinkable and reproducible without looking at the original codes at all. This will prove the new and old productions came from the same idea; not from infringing each other work (codes).
Then clean write. If can, from scratch without looking at the source codes. Obviously you don't let both productions interact with one another.
The above is before AI era. Read up "BusyBox vs. ToyBox". Do note that copyright is one thing; algorithm patent is another; trademark is an entirely a different game. You still can get sued for patent/trademark infringement even though you clean write everything. Consult your IP lawyer first.
Also, copyright lawsuits are rich people's game (think music industry's dramatic civil cases). OSS is known for lack of funding. Pulling such chess move against each other only prove funding contradiction and immediately force people to completely avoid that OSS project and developers. No one likes getting sued out of the blue.
Lastly, Coreutils are commodities software by design so choosing a permissive license makes a lot of more strategic sense than GNU. You want everyone (proprietary or GNU) to use the same codebase and avoid fragmentation. You don't want to repeat "BusyBox vs ToyBox" cases again. (Basically GNU folks keeps suing people until Android folks has no choice but to reinvent ToyBox; making ToyBox the current widely used version outside of PC/server.)
Oflameo@reddit
Any project, closed and open can be rewritten with a different license.
papercrane@reddit
Many of the utilities in GNU coreutils are re-implementations of differently licensed tools. The history of many of the tools go back to closed-source implementations released 50+ years ago.
mina86ng@reddit
See chardet issue #327.
__konrad@reddit
chardet project status: toxic
NatoBoram@reddit
Of course. There's even a service to clean-room slopfork existing open source projects: https://malus.sh
Serialtorrenter@reddit
Yes. The reverse is also possible; Linux was originally an open-source reimplementation of Unix APIs. Similarly, WINE reimplements win32 APIs. In the US, Google's victory in Google v. Oracle affirms that these types of projects are kosher.
WorBlux@reddit
The first thing to understand is that copyright is first and foremost for the protection of "creative works"
Copyright does not cover purely factual or functional elements within a work. API of a program is usually regarded as functional rather than creative, and the API of a program is usually regarded as a functional element.
If a second program comes along and replicates the API, then it is not considered infringing or a derivative work simply on the basis that it provides the same functional elements as the first.
throwaway6560192@reddit
The GNU coreutils are far from the only, or first, coreutils in existence. It does not follow that any new implementations of coreutils must be a copy or translation of the GNU ones.
Altruistic-Rice-5567@reddit
Code is copyright-able. An algorithm (a process of doing something) is patent-able. As long as you don't copy code you can rewrite it as long as you are not creating code that runs a patented process. The ZFS filesystem presents a patent problem. Parts of what it does were patented for its process in general. People can write clean code that duplicates the behavior of ZFS without ever having seen the original commercial code. This would be legal. But in doing so you would be implementing a patented algorithm (otherwise its behavior is different) and that requires licensing.
Patented algorithms are pretty rare and difficult to obtain. They usually aren't a factor. So, yes, almost all commercial and open-source projects can simply be rewritten/duplicated and the author of the new code can choose whatever license they wish to apply.
yahbluez@reddit
The answer is very simple, if you REWRITE something, it is your work and not a derivated work, so you can chose any license you like.
MelioraXI@reddit
Depends on how you rewrite it. If it's copied with minor edits, you might violate the license.
Clean room is a different story.
Demented_CEO@reddit
What's "clean room" needs to be legally clearly defined, as well!
E.g. giving an LLM access to the source code means it can infer what the various functions are supposed to do and then craft its own version from "scratch". That wouldn't be clean room, would it?
MrMelon54@reddit
IANAL An LLM without direct access to the source should not be defined as clean room as there is a high probability it was trained on the original open source code anwyay.
Laerson123@reddit
It doesn't matter if it was trained on the original code, LLMs are generalists.
There's this false assumption that LLMs can only output what was used in their training data. However, if that was the case, LLMs would be terrible on doing what they are supposed to do.
The goal of ML is for the agent be able to generalize correct output based on a small sample. A good analogy is an agent trained to decide if a photo is a photo of a dog: An LLM that can only output what it saw in the training data would be an agent that would fail when shown any photo outside of the training dataset.
If you tell an AI coding agent to implement some open source application from scratch, it doesn't have the original source code embedded in its model, it will follow a step by step "reasoning" process to determinate the functionalities it needs to implement, and how to implement it. Even if the same prompt and same model are used twice, there is a high chance that the programs will be different.
KittensInc@reddit
When it comes to rewriting software, it is a huge problem that it can output snippets of its training data at all.
If your training data contains any part of the original source code, you are unable to guarantee that there is no license infringement. Did it copy/paste a code snippet, or did it simply come to the same logical conclusion given the specs? Nobody knows!
It's no better than a programmer who read the original code and then wrote a new version, while making a pinky promise that they didn't use any of the knowledge gained during the rewrite: cute, but your lawyers will probably tell you that it is a Really Bad Idea.
MrMelon54@reddit
Why do we force humans to do clean room style coding (with two developers) to reimplement the original logic instead of just letting them look at the original code and write it again from scratch? There is a legal distinction between implementing new code to fit the same function signature and taking the old code and rewriting exactly the same logic to be different from the old code.
chemistric@reddit
Clean room is not a legal requirement, it is just one technique to avoid copyright issues.
You can create an new implementation of a project even if you're very familiar with the original source. In that case it is not a clean room implementation, but that doesn't necessarily mean it's a derivative work or infringing copyright. It just becomes much more difficult to prove that your did not use any of the original source.
With LLMs it's tricky since even if you didn't give it the original source, it could still have seen it in it's training, so you can't generally do clean room with that. With the original copilot, there were many examples of people getting it to output copyrighted code from its training data as-is. So for an LLM, you'd either need to ensure it never saw the original source as part of its training (very difficult on projects with public source), or use a different approach to provide the generated code is not derived from the original source.
That said, the burden of proof is likely on the person claiming copyright infringement, not on you rewriting the project. It may just be difficult to defend yourself.
randomperson_a1@reddit
Not a lawyer, but it seems easy to argue that's a derivative work.
yahbluez@reddit
If it is easy you may try to explain why for example writing a tool like ping again with modern tools that did the same as the old one did, but without using any of the old ones code.
Decades ago the GNU tools came into the world using this methode. It was handled in courts many times and looks like some people today do not know anything about that.
This who rewrite the GNU tools today do the same the GNU tools did in the beginning while rewriting the original UNIX tools.
While the originals are ⓒ protected the GNU tools chose the GPL license.
Rewriting the old stuff with rust is a good move. Not sure if the move to MIT is good too, that's a question for layers.
FranticBronchitis@reddit
That seems plausible actually. A common form of clean room engineering today is have someone go over the product and draft a specification for its behaviour, and then get someone unaffiliated to actually implement it.
If that's allowed, I don't see why swapping out the humans for LLMs would make it illegal
but not a lawyer
yahbluez@reddit
This is a good point, a lot of people "think" still that LLM are just parrots because they do not understand what the actually do.
FranticBronchitis@reddit
It'd be clean room if you used two different LLMs
One that can distill a behaviour specification for the software and one to implement it
That's usually how it works, you have someone draft the spec and someone unaffilated implement it afaik
Also not a lawyer
OrwellianDenigrate@reddit (OP)
I was also wondering about this, you could even tell the AI to produce code that shouldn't look like the original.
yahbluez@reddit
yah, sure but C&P is not rewrite.
Rewrite is re-write, that is write code that did the same without just do a c&p.
mrlinkwii@reddit
they always could be
Pale_Hovercraft333@reddit
Yes. This was created as a joke, but its not really a joke: https://malus.sh/
FriendlyProblem1234@reddit
Nobody is rewriting anything. This is just a completely separate project developed by an entirely different person. The "original" project (quotes, because there have existed a bunch of coreutils alternative for decades) is entirely unaffected by it.
MatchingTurret@reddit
I mean, that's how we got the PC architecture of today. Columbia Data Products reverse engineered the IBM PC BIOS and was able to produce compatible PC's without IBM's intellectual property.
JustBadPlaya@reddit
Rust coreutils are effectively a clean room implementation, as the team doesn't want to have problems with GNU's legal team.
A rewrite that still copies pieces of the original source code is subject to legal scrutiny and lawsuits, but a clean room rewrite that only goes for feature parity without touching the original is not.
Funnily enough, even though copying code is problematic, reusing GNU's test suite is not, as that's a separate piece of code, so the uutils team does so
thomasfr@reddit
I mean they don't really have good compatibility with either the POSIX standard or GNU coreutils which are not 100% POSIX compliant so it's not really a good implementation of anything that already exists.
Putting that aside, they don't really need to reference the GNU Coreutils source, they can look at the POSIX standard docs and/or any of the open source unicies which already have their source code in whats probably a more MIT compatible license.
So for this specific project they have ways to go which does not involve GNU coretuils if they are fine with breaking compatibility which they are since the rust coreutils versions are totally broken from a compatibility perspective.
BothAdhesiveness9265@reddit
do you have a source for them not having good compatibility? according to the uutils github they pass 94% of the tests. plus canonical used them in Ubuntu 25.10 with, from what I understand, only a handful of issues (since fixed)
thomasfr@reddit
I remembered that I wrote about this recently on reddit. Here is my general take and a very serious defect in
mv.https://www.reddit.com/r/rust/comments/1s43bn4/comment/oclhglx/
thomasfr@reddit
Just look at their issue tracker, you can find anything from lots of small incompatibilities to data loss potential level stuff for the tools that directly modifies data.
kopsis@reddit
The legal gray area with AI rewriting is whether it's truly a "clean room" re-implementation if the LLM was trained using any of the source code it's rewriting. Only the courts have the authority to definitively answer that.
TomDuhamel@reddit
You appear to be under the impression that this is a translation. Like how a novel can be translated from French to English to sell it to a new audience. In this case, this is covered by copyright and you need a permission (typically a contract) to proceed.
These utils are not a translation. It's not looking at the source and rewriting the same code in a new language. First of all, that's not how it works; C and Rust are you very different languages, you couldn't just rewrite an app from one to the other directly, that wouldn't work.
The utils are rewritten, from scratch. It's basically redoing the same functionalities, from scratch, in a new language. It's not saying that they never looked at the source, but these do mostly quite rudimentary tasks that are easy to implement.
Copyright doesn't cover what a program does, it covers the final program, code or binary.
If someone wants to redo the Linux kernel in Rust, they are absolutely free to do it. And under the licence of their choosing. They just aren't allowed to call it Linux.
rebootyourbrainstem@reddit
Rewrite is kind of a fuzzy term. But yeah as long as you don't base it on the existing code and it just works the same, it's fine.
In fact some BSD's and even Windows (Windows Subsystem for Linux 1.0, before they switched to a "real" Linux kernel in a VM) have implemented Linux compatibility layers in the kernel, that go far beyond generalized posix/unix compat. The difference between that and a "rewrite" becomes really subtle.
0xe1e10d68@reddit
You are always able to reimplement any code without violating copyright. Copyright only protects the work, but not the process or technique (patents do that). So you’re allowed to achieve the same results, even using the same methods, as long as you don’t plagiarize the original work.
Basically you try to imitate the behavior of the original program 1:1 (or less if you don’t need an exact match) idependently.
Severe_Stranger_5050@reddit
It’s only the code, not the functionality, that’s actually copyrighted.
So if you rewrite an application, script or a function in a “clean room” with none of the original code, then you’re the rights holder and can license it however you want.
Although you might run into design, trademark and patent issues but that’s beyond the scope of the question.