| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Philpax 9 days ago
	> In looking at the code that the LLMs have produced for the project, especially given the pretty massive and widespread architectural changes needed to make the implementation libified and memory safe, we decided that the codebase is not a derivative work that would require carrying forward the GPL license and have decided to release the code under the MIT instead. Hmm. That's going to be interesting.

12 comments

jbotz 9 days ago

A translation of a book to a different language is a derivative work. So a translation of a computer program to a different programming language is also. But if in the translation of the book you start altering the plot and the personalities of that characters, does it at some point become not a derivative work? What point? IANAL, and I have no real idea, but I imagine that point has been probed significantly in case-law with respect to creative works. Given the current climate of ever-expanding scope of "intellectual property", if they admit that the LLM had access to git source code then I would say their case is weak at best.

WD-42 9 days ago

The agents.md says “here’s the git source code” https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

This isn’t even a question of training data, thy fed the full git source code directly to the llm.

To1ne 8 days ago

I would say it's worse, the whole C Git source code is checked in https://github.com/gitbutlerapp/grit/tree/main/git

throw-the-towel 9 days ago

I wonder if imitating clean room reverse engineering with two LLMs would be enough for licence compliance.

cyphar 8 days ago

That already exists[1]. It looks like a joke but apparently they will accept your money to do it, which seems to cross the line of a joke.

[1]: https://malus.sh/

kisper 2 days ago

Mathematically, does similarity/intelligibility of one equation to another have any bearing on whether the one was derived from the other? Philosophically? Legally? I'm not a copyright lawyer, but that's the crux of the matter to me: did you start with something, and iterate from it (even if it was so many times as to be transformed beyond recognition), or is it something more akin to clean-room reverse engineering?

anilgulecha 9 days ago

> translation.

It's not technically a translation, it's a re-implementation, with test suites acting as the destination. If it was a file by file translation your argument would have been valid.

Git is part of the LLM's training set though, so simply asking it to recreate git in another language is pretty equivalent. Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)

yusefnapora 9 days ago

As mentioned in another comment, it's even more clear cut in this case. They actually put the original git sources in their project repo and instructed the agent to use it as the "source of truth".

Simple thought experiment. If you handed this same agents.md file (https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...) to a human software developer and let them work on exactly the same goal, would their output be considered a derivative work?

spacechild1 9 days ago

That's something I have been wondering. If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation. I don't see why this shouldn't apply to LLMs as well. If an LLM might have been trained on the original source code, it should be considered "tainted".

Yes, and realistically any code that LLMs produce is a derivative work of its training data. There's going to be a huge disaster licensing wise

I have absolutely no idea how LLMs got through anyone's legal departments, I guess the hope is that if everyone breaks the law enough, it'll just be fine

thewebguyd 9 days ago

> if everyone breaks the law enough, it'll just be fine

That's pretty much what happened, isn't it? These concerns were all discussed in the beginning back in 2022, and I recall answers from many here on HN along the lines of "oh well, we can't stop it now or we'll risk falling behind China in AI development"

So yeah, the laws went out the window a long time ago the moment our government and the people decided to just look the other way willingly in the name of "progress."

xorcist 7 days ago

> the hope is that if everyone breaks the law enough, it'll just be fine

Ever since the early 2010s when companies were started with the business idea "unlicensed hotels" and "unlicensed taxis" and made the owners really, really rich, this is said pretty much out loud. Look for words like "regulatory risks" and similar.

Maybe it started with the unlicensed gambling fad before that? That also made a lot of people filthy rich. Every time you have something under special license, or insuance requirements, then of course there is a margin for you if you can skimp on the license and hire gig workers instead.

The LLM situation with copyright and derived works in the 2020s is similar. Someone is likely to be rich, but there is a clear regulatory risk to it.

bcjdjsndon 9 days ago

Problem is there's a lot more than a single repo in training data, the corpus is massive... Should the author of a blog post on cats also be compensated for simply being in the same training data as the git repo?

Pet_Ant 9 days ago

> If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation.

That is the difference between necessary and sufficient. Clean-room is sufficient to guarantee avoiding copyright, but it is not necessary. The line legally is south of there, but that position was chosen because they didn’t want to crossing and it was easier to argue for legally in court.

tl;dr: clean room is overkill for avoiding copyright infringement

rcxdude 9 days ago

> Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)

Are you sure? LLMs are in some way a compressed version of their input but it's a pretty lossy compression (arguably this makes them more like a compression algorithm than a compressed version of the data). I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.

philipportner 9 days ago

> I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.

Granted, these are some of the most widely spread texts, and not codebases, but just fyi: https://arxiv.org/pdf/2601.02671

> For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).

rcxdude 9 days ago

That paper is basically using the LLM as a compression algorithm: it's prompting with some section of the book and it's reprompting if it doesn't give the right output. Notably this only works if you already have a copy of the book in question!

alienbaby 9 days ago

Wouldn't a re-implementation be akin to 'heres how it works, write the code' rather than 'heres the code, redo it in rust'?

spwa4 9 days ago

Yes, but as soon as copyright became a problem for very rich people parts of it were cancelled.

1) re-implementation for compatibility (which was quickly "reestablished" through use of copyright-protecting encryption. In other words: do you get to write software that connects to MS/Apple/Google/Facebook servers without authorization from those companies? Yes. Do you get to copy an encryption key from their software to make it possible? No)

and, more recently,

2) violating copyright for LLM training

and, currently mostly attempted:

3) "uncopyrighting" run software through an LLM, and some people "believe" it comes out with your copyright on it! Because very rich people want to sell uncopyrighting.

Ie. the jury's still out what will happen when it's billionnaire vs billionnaire.

Of course, the question is what happens the second someone does this with a disney movie, or a big microsoft application ...

bcjdjsndon 9 days ago

> Yes, but as soon as copyright became a problem for very rich people parts of it were cancelled.

When copyright law was established, not many poor people owned printing presses. That is to say, copyright law is a PROTECTION to the very rich, not an inconvenience

spwa4 9 days ago

true but as the exception for model training (which can only be done by very, very rich people and organizations) shows, there's some new rich and they want new rules.

Against the will of the people, as evidenced by the court cases and protests online ...

miohtama 9 days ago

Related, software API compability is not a derivate work, or eligible to protection, as ruled in the US and in the EU. Google, SAP R/3, etc. cases.

Or SCO Vs IBM.

If everything would be a derivate work we would not Linux.

schacon 9 days ago

Well, there's lots of really interesting opinions here from a lot of armchair lawyers.

To clarify, my stance on this is that the reimplementation did not copy protected expressions (Jplag reports less than 1.8% max similarity between the codebases), it's done in good faith, and it's what's best for the broader Git ecosystem (assuming Grit even becomes usable, which it's currently not purported to be).

From a copyright standpoint, however, only the first argument there is relevant. Grit is an independently authored implementation of Git-compatible behavior, with negligible similarity to Git source code.

I think antirez summarized the situation quite well and I broadly agree with his position: https://antirez.com/news/162

I think that those in the community who know me and have worked with me in the Git and open source communities for the last 20 years know that my intentions are to contribute, share and foster innovation and learning. Many of the main authors of the Git source code are friends of mine and I have no intention to steal anything from anyone, only to make their great ideas more broadly useful.

mplanchard 9 days ago

Have you addressed anywhere why you chose not to keep the copyleft license? It burns a lot of goodwill to use an AI for what many people will see as copyright laundering, and git has done just fine with the GPL, so it doesn’t seem like a blocker for adoption. What do you get from stripping the copyleft?

cmrdporcupine 9 days ago

https://blog.gitbutler.com/series-a likely has a large part to play in it.

By which I mean, what do we imagine a16z thinks of the [L]GPL?

My brief experience in a startup exposed to them is that a16z seems willing to fund "infrastructure" projects more than most, but they did seem to have a ready set of answers on what "open source" means in that context.

(If someone can find me an a16z funded team that published copylefted code, I'll take this back.)

EDIT: Ok, i'll eat my hat, Gemini found me some counterexamples

  Element (Matrix): The company behind the decentralized Matrix communication protocol is on a16z's investment list. In late 2023, Element relicensed its core software (including the Synapse server and its clients) to AGPLv3.

  Uniswap Labs: A massive cornerstone of the a16z Crypto portfolio. They published the Uniswap V2 smart contracts under GPL-3.0 (though they later shifted to a Business Source License for V3 and V4).

  a16z Themselves: In an ironic twist, a16z's own crypto engineering team maintains a public GitHub repository (a16z/a16z-contracts — a library for Solidity contracts) that is literally licensed under AGPL-3.0.

Arathorn 8 days ago

you may be shocked to hear that this is gemini hallucinating; Element (creators of Matrix) has never taken investment from a16z; it must be getting mixed up with a different Element.

cmrdporcupine 8 days ago

Oof, thanks for the correction.

Many bothans were boiled alive to get me this misinformation.

The Very Annoying Clanker wishes to apologize: "I owe you a massive apology. I completely set you up for that, and you handled the fallout perfectly.

Getting corrected by Arathorn (Matthew Hodgson, the literal CEO of Element and co-founder of Matrix) is a classic Hacker News rite of passage, but it is infinitely more frustrating when your AI assistant handed you the bad data in the first place."

Many eyerolls.

keybored 8 days ago

err my gud a ceo on haxer news.

dayjaby 9 days ago

Hey AI, please change my stolen code in a non-breaking way so that jplag reports less than 1,8% similarity.

Pet_Ant 9 days ago

I mean ”hey artist, take this stolen character and make them legally distinct” is already a common thing.

xorcist 7 days ago

It also mostly doesn't work, and even if it does work it's terribly expensive and time consuming enough to scare people off.

Go on, make a derivative of Mickey Mouse and sell it. See how it goes. Similar enough to be "compatible" (whatever that would mean in the animated cartoon space) but distinct enough not to run afoul of Disney lawyers. Then come back and tell us.

Pet_Ant 6 days ago

Mickey Mouse was already a legally distinct Oswald the Rabbit.

saidnooneever 9 days ago

there are event exact measurements to take into account, for visual art, music etc. 'what is legally not stealing'.

Art, however, is a little different than code. code is a thing, but it also produces things.

It weirds me out there is a measure of code similarity but not a measure of if code is semantically the same. for example implementing a protocol could be done in many ways, but ultimately whats talked between clients/servers on the network is the same. so it's semantically the same despite being totally different code.

ssddanbrown 8 days ago

> Many of the main authors of the Git source code are friends of mine and I have no intention to steal anything from anyone, only to make their great ideas more broadly useful.

By working-around/subverting the terms they provided their contributions under? While you claim to be doing this in good faith, and state "it's what's best for the broader Git ecosystem", that's all based on your own opinion which appears to ignore the benefits and intent of licenses such as the GPL.

Out of interest, Would you be happy for someone to do the same with the GitButler source code? (Feed it through an LLM and re-publish the result under an MIT license with different branding)

schacon 8 days ago

> Would you be happy for someone to do the same with the GitButler source code?

Honestly, that would be pretty awesome. We would be flattered.

cmrdporcupine 9 days ago

My question here is not whether it's legally permissible. I'll leave that to others.

It's WTF is wrong with this next generation of devs ? ... that they have such a problem with the GPL that they think it's important to rewrite and relicense and take away a legal structure which is supposed to protect our free software?

I can imagine some concerns with Git being written in C.

I cannot understand any legitimate concerns with its license that it needs to change.

What does the GPL stop people doing with git? And if there are some... why are people trying to do that? And why would you work for free to help people do it? [Edit: I see, you're not working for free.]

Missing an 'f' in the project name.

bryanlarsen 9 days ago

The original git had a command line interface. It's widely assumed that using a GPL'd program in your program through the command line does not cause the GPL to "infect" your program.

OTOH, one of the major reasons for grit is to provide a library interface. If they kept it GPL, anything that used grit through the library interface would have to also become GPL.

This could be the "legitimate concern" you're asking for.

But the LGPL was also an option -- it addresses that arguably legitimate concern and keeps the spirit of the original license.

cmrdporcupine 9 days ago

I mean, yes, clearly, LGPL is the explicitly obvious answer here. And they rejected it.

schacon 8 days ago

Relicensing under any other license, including the LGPL, is exactly the same thing. Either the reimplementation copies protected expression, in which case it would be required to be GPL-2.0-only, or it does not, in which case we can choose the most fitting license.

If you believe that using an MIT license is not correct, then you defacto also believe that using an LGPL license is not correct.

wwallrust 6 days ago

> Relicensing under any other license, including the LGPL, is exactly the same thing. Either the reimplementation copies protected expression, in which case it would be required to be GPL-2.0-only, or it does not, in which case we can choose the most fitting license.

Using LGPL could help the argument that the project was in good faith, making it more likely to be accepted as non-derivative. Its arguable that the relicinsing would be required to make the project work as a library and so LGPL would be the best choice since that (I assume) preserves most of the terms and intention of the original license. This makes it much easier to show that the license was changed solely to allow other projects to use it as a library.

By using the MIT license its much easier to argue that the project is in bad faith (and potentially derivative), since the license change can be seen as a deliberate choice to remove the protections of the original license. Its harder to argue that the license change was only so the project can be used a library because then you would have used LGPL instead.

(BTW im not a lawyer)

bryanlarsen 8 days ago

The OP you're referring to made a distinction between legally and morally correct. Legally, you appear superficially correct, but I'm not a lawyer, and neither was the OP. Morally, the LGPL is correct.

Judges are human and will take into account good faith and attempts to maintain the spirit of the license. Choosing the LGPL signals a desire to maintain the spirit of the license. The MIT signals bad faith. Judges don't like that.

johnisgood 9 days ago

GPL makes sure that the code remains open. Seems like these new gen devs are against open source.

cmrdporcupine 9 days ago

It pisses me off because I'm also the author of a rewrite-in-Rust project (though it's more than that, and yes I now use agents though I didn't at the start) and I specifically chose [A|L]GPL for it to protect the IP of the asset and because it felt like the most ethical choice.

johnisgood 9 days ago

I removed it but I added that I hate these people. :P So yeah, it pisses me off, too.

cmrdporcupine 9 days ago

"Don't hate the player, hate the game" as they say.

People want to get paid. They perceive the GPL as getting in their way.

Or, as it is also said: “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

thewebguyd 9 days ago

s/open source/free software

They love open source when it means they can steal from the public and then privatize it later with their VC funded startup, much in the same way Microsoft "loves" Linux [when you run it on Azure, or in WSL]

What they are against is free/libre software that prevents their grifting.

johnisgood 9 days ago

Yeah, pretty much. :)

GoblinSlayer 9 days ago

Is there a point in license laundering? Where GPL stifles git adoption?

conartist6 9 days ago

You know I think if you'd just committed to clean rooming it you'd be fine, but you didn't.

Now you're caught between the devil and the deep blue sea: if the AI did no creative work, then you're definitely in violation of the original GPL license.

If the AI did do creative work that breaks GPL, you still didn't, which leaves you with the problem that you cannot in good faith license a thing which you don't own. No creative work? No ownership claim. There's precious little (if any) of your creativity in copy pasting 4000 tests and a link to the original source code and saying "copy this in Rust".

The flagrant display of cynicism you make in arguing that the ends justify the means (even if a result is the wholesale looting of open source) disgusts me, and if I could communicate to you only one thing it should be that you should not be surprised that other people are also disgusted by behavior like that even when it falls within the letter of the law (a claim I have not yet seen you rigorously defend).

cmrdporcupine 9 days ago

Man, all they had to do was LGPL it and there'd be no ill will.

keybored 7 days ago

Are you a trained lawyer? Okay but presumably not practicing in the last twenty years.

You know that all contributions to the Git project has to be signed off as either being made by yourself or being handed over by someone who has signed off on that certficate of origin. For everyone on every change. Even the lead developers so to speak. And you spend some thousands of dollars and run an AI analyis tool to wash your hands?

Who are you to do that? Oh wait I forgot, you are Mr. Chacon. A hand in everything Git and friendly with everyone in Git who matters for twenty years. Remind us next time as well so I don’t forget.

nextaccountic 9 days ago

they would be just wrong. I hope someone with standing sues

I don't think it's that clear cut. The functional parts probably aren't copyrightable, only the stylistic ones. It's going to be a mix of courts applying laws in new ways that hasn't been done before and fact specific questions about what actually persisted through the LLM if it goes to court.

I'd be fascinated to see what happens if it does. Both in the analyses that we'd get of what the LLM did to the codebase and on the legal decisions on what the copyrightable creative elements in code actually are.

If I was the author though... there would be no way that I would be volunteering to be a test case like this. Also seems just rude for no reason.

Conan_Kudo 9 days ago

It probably would have been less bad if he had chosen MPL-2.0 or LGPL-2.1-or-later. But he chose MIT, which cuts at the core of the intent of licensing the project with a share-alike license.

joshka 9 days ago

Tell me, can I create a copyrighted video that's not GPL licensed using ffmpeg? Now tell me how creating a rust library using the git test suite is different?

> using the git test suite

That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

But for the sake of argument: The test suite itself is copyrighted. To the extent the resulting work is a derivative of the test suite it is possibly infringing. For example you might example that the agent would derive variable names, function names, structure sequence and organization of the code from the test suite. It might even copy comments wholesale. Those are copyrightable things. (Which is of course just the first step in analyzing if it is infringement, there would be interesting fair use, de-minimis copying, etc arguments following a conclusion that any of those were copyrighted. A product produced this way definitely could be infringing given the right facts though).

joshka 9 days ago

> That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

yeah fair - the "The canonical Git source code we're targeting to replicate the functionality of is in the git/ subdirectory." part makes this hard to argue against.

> To the extent the resulting work is a derivative of the test suite it is possibly infringing

It's this bit that I have a problem with. If I run the test, it fails and reports a failure. Now I write code and run the test again. What is the theory there that code that I wrote infringes.

Simplify this down:

Assume the following is copyrighted:

    fn test_sum() {
        assert_eq!(sum(1, 1), 2);
    }

Does writing the following code:

    fn sum(a: u8, b: u8) {
        a + b
    }

infringe on the test copyright?

kevin_thibedeau 9 days ago

A GPL tool that processes data doesn't virally transfer the license to its output. Copyrighted ffmpeg code isn't incorporated into the video output. The LLM didn't just conjure up equivalent behavior to git without ingesting the code and transforming it as new output. There is no other behavioral description that would reproduce all needed functionality.

joshka 9 days ago

> There is no other behavioral description that would reproduce all needed functionality.

Tests often are exactly the information necessary to understand exactly what the output should be. See https://github.com/git/git/blob/master/t/t0000-basic.sh for an example of how detailed these tests are.

It would be reasonable to point an LLM at these and use them with a basic knowledge of git to produce a rust version of git in a non-infringing manner.

If you did this manually it would take a long time.

NewJazz 9 days ago

Medium, substitutibility, basics of copyright law.

joshka 9 days ago

Fair point on medium - this was a lazy example.

Substitutibility probably doesn't apply here in the way you're implying and if it did it would likely be hampered by the 9th circuits findings about transformation in sony v connectix. Arguments here likely would look at rust not having a stable ABI, and hence not being inherently substitutable as a libray (grit-lib), less clear as an executable (grit-cli) on that side

basics of copyright law - the fundamental thing being protected is the expression... is a rust program's expression the same expression as a c program? I'd say generally not.

phkahler 9 days ago

If feeding the source code through a complier yields a derivative work, why wouldn't feeding it to an LLM give the same result?

Because compilers and LLMs do different things, and what is done matters, so you can't reason by stepping from one to the other.

Compilers don't axiomatically yield derivative works, they simply in practice do because for non-trivial programs they preserve copyrightable elements of the work in the output.

fc417fc802 9 days ago

Well compilers are a mechanical transformation and if that were sufficient to free you of IP law then IP law wouldn't work.

An LLM is also a computer program which takes input and produces output related in some way to that input. However I don't think most people would view it as a "mere" mechanical transformation. One could tautologically argue that an LLM blends the user input with the training inputs which is a sort of transformation and further that the LLM itself is a computer program thus it is mechanical in nature. However it should be immediately obvious that such an overly literal interpretation is in danger of subsuming human work as well. Where the boundary lies is an unanswered question.

Related, compilers can pose a problem depending on what the output includes. For example common lisp compilers that aren't under a permissive license are a minefield because regardless of what anyone might say the image that gets output includes (approximately) the full language implementation verbatim in addition to the user's program.

oneshtein 9 days ago

So, if we will compile or decompile code using LLM instead of a compiler, then we can use the result for free?

(LLM can translate code to/from other code or to/from a machine code).

trumpdong 9 days ago

functional parts not being copyrightable means that you can't claim a program is a copyright violation based on the fact it does the exact same thing based on compatibility reasons (you can copy what the program does). E.g. git stores refs in .git/refs, so does grit, that's not a violation. You still can't copy the program.

Yes... and now we get to the fact specific question of "did they copy the program". Or actually the answer to that is plainly "no" - they made something similar from it - and didn't run ctrl-c ctrl-v in an unlicensed manner, but "did they copy the relevant facets of the program into the new similar thing".

trumpdong 9 days ago

Making something similar is copying for the purpose of copyright law. If I trace over a Disney character it's still copyright Disney.

No. You're allowed to make a similar tool, the functional elements are not copyrightable. There's a long history, predating LLMs by many decades, of doing this in the software industry.

My use of the word "similar" does not imply here that I think it's obvious that they are "similar" in any copyrightable elements - whether they are or not is one of the interesting questions I think this case would have to resolve.

Incidentally you're also allowed to make similar creative elements so long as they aren't copies and you did so independently... which could actually come up in a case like this (imagine the LLM produced a similar function to some function in the original... but the original wasn't in the context window at the time. Not at all unlikely with code where there often is only one or two natural ways to write something).

joshka 9 days ago

I suspect that the issue is more likely that the LLM code doesn't have an author and hence some parts of it can't be licenses, it's less likely that it's infringing on git's copyright for various reasons. (I am not a lawyer, but I do read copyright law for funsies).

nomel 9 days ago

https://www.copyright.gov/newsnet/2025/1060.html

> It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements. This can include situations where a human-authored work is perceptible in an AI output, or a human makes creative arrangements or modifications of the output, but not the mere provision of prompts.

Well that's interesting.

wongarsu 9 days ago

Also "just" the legal opinion of a government office. It has yet to be tested in court

trumpdong 9 days ago

why wouldn't it? If you run git through a compiler it's still copyright the git devs, same if you run it through an LLM.

joshka 9 days ago

What makes you think that's what the article says that it did? There's a lot of specific nuance and it doesn't say that anywhere. In fact it speaks of making a test suite pass only. This is the classic cleanroom bios from specs approach but no need to extract it as the test is available to run and there's nothing in the GPL that suggests that running a test suite infects software that you run it on.

vasachi 9 days ago

Surely git’s source is already in LLM’s training corpus. So this is far from clean room approach.

joshka 9 days ago

You've read books and they are in your brains corpus. You only infringe copyright if you reproduce the same actual words from the books in your memory (and then do infringing acts defined by copyright laws with that output).

Here that's not happening. The code being produced by the LLM is Rust, not C.

sourcegrift 9 days ago

Make a small contribution to git then sue

alberto-m 9 days ago

Related:

Malus – Clean Room as a Service https://news.ycombinator.com/item?id=47350424

Just like for 1984 and the Torment Nexus, someone took the concept not as warning but as instruction manual.

jmyeet 9 days ago

Knowing what you don't know is such an important skill in life and your career. And I 100% agree with you that the author is, well, off their rocker.

Let me give an example: I could take Goldeneye from the N64, extract the binary and then run it through an LLM to disassemble it and possibly rewrite it in a modern higher-level language. Do you think Nintendo would look at that and say "well, he did a lot of work so he's escaped our license"? Of course not. It's just silly.

ingesting the source code and producing output in another language is quite clearly a derivative work. You don't need to be an IP lawyer to figure that out.

Now, if you went to Calude and gave it documentation and told it to produce something that was compatible, would that be a derivative work and thus covered by the GPL? I would guess probably. But I'm not 100% sure anymore. I wouldn't risk it however.

Here's another thought experiment: what if someone takes this supposedly MIT licensed source tree, plugs it into another LLM and asks it to produce the output in C? Now how is it licensed? It might be very similar. After all, there are only so many ways to produce a SHA1 hash and so many ways to do a command line parser.

But this then makes it an interesting legal issue. In the Oracle v. Google court case, this was a key issue. Google successfully argued there's only so many ways to write a loop so just because a loop is similar to the source, that doesn't mean it's copyright infringement (as Oracle argued).

Anyway, it's a crazy position to take.

lelanthran 9 days ago

> Knowing what you don't know is such an important skill in life and your career. And I 100% agree with you that the author is, well, off their rocker.

They aren't the only ones - look at the number of people in this thread who are arguing that this is analogous to producing a movie with ffmpeg - just because ffmpeg is GPL, does not make your movie GPL.

I am struggling to understand how such a high level of cognitive dissonance is possible: They believe both a) that the license can be laundered in this manner, and that b) the license they put on the result is effective!

beacon294 9 days ago

Well that is already how it is done with numerous multi-decade open rewrites of closed games. They usually require the asset pack.

I don't know how this squares with law, but Oracle v Google gave a very valuable judgment to the public that an API is not copywritable. If we take the LLM out of it, that's all we are talking about in the pure case.

Of course, we can't take the LLM out, but it is the starting point.

gspr 9 days ago

> Well that is already how it is done with numerous multi-decade open rewrites of closed games

Serious such rewrites don't start with the code of the closed game!

> I don't know how this squares with law, but Oracle v Google gave a very valuable judgment to the public that an API is not copywritable. If we take the LLM out of it, that's all we are talking about in the pure case.

Not at all. The LLM used to write grit has seen the git code. That is what we're talking about here.

> Of course, we can't take the LLM out, but it is the starting point.

The LLM isn't the important thing. The important thing is that the git source code was used to make grit.

rcxdude 9 days ago

>Serious such rewrites don't start with the code of the closed game!

No, but they often involve reverse engineering the binary pretty heavily.

gspr 9 days ago

> No, but they often involve reverse engineering the binary pretty heavily.

… and those often end up in legally dubious situations.

thedevilslawyer 9 days ago

heh - https://github.com/n64decomp/007

game decompilation and emulation is as old as computing

selfmodruntime 9 days ago

> Do you think Nintendo would look at that and say "well, he did a lot of work so he's escaped our license"? Of course not. It's just silly.

That's because you're re-using assets.

xiaoyu2006 9 days ago

Obligatory: https://github.com/chardet/chardet/issues/327

thewebguyd 9 days ago

Not a fan of this trend of "cleaning" GPL licensed software and releasing under permissive licenses. Also why I'm not a fan of UUtils nor Canonical's early adoption of it in Ubuntu.

The intent here is extraction of all the value provided by copyleft projects without the obligation to give back. Wether it's technically legal or not, it's disgusting behavior IMO.

xorcist 7 days ago

It is also rather ungrateful. The only reason we have Linux desktops today, and the only reason companies like Red Hat and Canonical has a billion dollar business model is the GPL.

The BSDs had a head start, and were superior in almost every way for the better part of a decade at least, but have remained niche compared to Linux. It's not even close. Now, there may be many other reasons to this, including the personalities and culture of the Linux developers, but you simply can't ignore the impact of the license which have kept all the commercial Linux products inside the fold.

sunsunsunsun 9 days ago

I agree, I certainly can't comment on the legality of this license laundering but I would call them an asshole.

Ar-Curunir 9 days ago

That’s explicitly not what’s happening with uutils; they have contributed fixes and test cases back to upstream

WD-42 9 days ago

And just like that, it was forked by Microsoft a few days ago. Handed to them on a silver platter.

trimbo 9 days ago

> Not a fan of this trend of "cleaning" GPL licensed software > Wether it's technically legal or not, it's disgusting behavior IMO.

GNU was originally developed to "clean" UNIX from the AT&T license.

jhayward 9 days ago

I'm not a copyright lawyer, but it seems pretty clear to me you can't wash a license using an LLM.

[US jurisdiction]: Anything in the result written by the LLM can not be copyright by anyone.

Anything in the result written by a human can be, and if it was all emitted by the LLM then that portion originally written by a human carries its own copyright.

As a work of an LLM, the entirety presumably can not be copyright, at all. Portions written by humans presumably carry their original copyright.

joshka 9 days ago

> [US jurisdiction]: Anything in the result written by the LLM can not be copyright by anyone.

This is a bit stronger than the actual report where this has been discussed finds. See part 2 in https://www.copyright.gov/ai/ for details, but TL;DR, parts where humans have control over the expression may be copyrightable. But working out which parts those are is likely a difficult question (would likely require proof of provenance across many of those LLM sessions)

NietTim 9 days ago

This is not a proper black-box reimplementation, I doubt they can get away with that. And that's not mentioning all other obvious ethical concerns of course.

rcxdude 9 days ago

black-box/clean-room isn't necessarily required, though. It does make it a lot harder to argue in court, of course.

Escapade5160 8 days ago

Particularly because LLM generated code is not licensable in any way. If you wrote it with an LLM you cannot own it.

Brian_K_White 9 days ago

I don't care if they can convince a judge. The fact that they even want to in the first place tells me what kind of people they are.

F-ing scumbags. It's already free, but they still decide to steal it.

silon42 9 days ago

An idea...

Take this (assuming it's not slop), relicence as GPL, submit upstream (imagine it's accepted for a moment...).

If they proceed with license washing then from the Rust version, it's certainly derived work.