Hacker News new | ask | show | jobs
by GreenDolphinSys 9 days ago
This is simply plagiarism of GPL-licensed code, and license-washing as well. I can understand working backwards from a test suite, but this literally just reads the original source:

https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

LLM users seem to live in another world where stealing everything that isn't bolted down, and passing it off as their own work, is acceptable.

3 comments

I see it differently. I look at it as if I had written this code myself, using this same approach. Look at the docs, look at the tests, look at the source, implement something that is interactively compatible but a very different approach.

For example, this is exactly what I did when I tried to get SSH commit signing working properly in GitButler:

https://blog.gitbutler.com/signing-commits-in-git-explained

You can see in the post that I dug through the C source to figure out how it was canonically done and then implemented something that accomplished the same thing in Rust but without copying source code.

There are some similarities between the Grit Rust source and the Git source, but it's mostly around time/formatting type things or byte offset type things needed to make packfile parsing and whatnot work, but as far as I can tell, there is no straightforward copying of code. The approach needed to make this a reentrant, memory safe, library driven codebase is so different that copying is generally not useful. But nobody can _guess_ how packfiles or reftable binary formats are specified, since they're not really documented. I'm aware of this because I'm pretty sure I _personally_ am one of the only ones who has ever attempted to document the packfile binary format: https://schacon.github.io/gitbook/7_the_packfile.html

You have to read the source. Which means that libgit2 and Gitoxide and every other Git reimplementation is also "license-washing" per this definition because they also had to reference the Git source to see what the technical specification is.

If you find any code in Grit that is clearly line-for-line copied, please point it out and I will replace it. But the Git source is the Git specification and every reimplementation, LLM or not, is forced to use this approach to build anything compatible.

Yet, you didn't write the code yourself.

On Gitoxide: Given that the author read the docs and source code [0], and literally copied files over from the git source [1], it also is license-washing. At least libgit2 is GPLv2 with a linking exception. I don't think people would have much to say if these projects honored the original projects' intents and kept a copyleft GPL license. But they don't.

> The approach needed to make this a reentrant, memory safe, library driven codebase is so different that copying is generally not useful.

This is obvious given how different Rust is from most languages. So are licenses pointless as a concept now, because anyone can argue their Rust implementation of a GPL (or whatever) project is meaningfully different? Nice loophole there.

Stripping away the GPL in favor of MIT/ASL2.0 seems to be the trend for rust projects (see uutils, etc). I'm really glad that we can make it easier for large companies to extract value from community labor and, in general, not contribute much of anything back.

0: https://github.com/GitoxideLabs/gitoxide/discussions/253

1: https://github.com/GitoxideLabs/gitoxide/issues/925

> I see it differently. I look at it as if I had written this code myself, using this same approach. Look at the docs, look at the tests, look at the source, implement something that is interactively compatible but a very different approach.

I could look at a C to Zig compiler in the same way: I read some C code, write the equivalent Zig code, repeat.

The compiler could also do some circumlocutions in order to provide an apparently different approach.

> I'm aware of this because I'm pretty sure I _personally_ am one of the only ones who has ever attempted to document the packfile binary format: https://schacon.github.io/gitbook/7_the_packfile.html

gitformat-pack?

> If you find any code in Grit that is clearly line-for-line copied, please point it out

Please hunt for specific lines to disprove your bold claim.

> and I will replace it.

Assuming the current claims here, that would just be license washing with volunteer assistance.

> You have to read the source. Which means that libgit2 and Gitoxide and every other Git reimplementation is also "license-washing" per this definition because they also had to reference the Git source to see what the technical specification is.

This makes no sense:

1. A court might agree with you if a human read the sources, then wrote a new implementation. Doesn't apply to trade secrets (i.e. cleanroom implementations), but certainly for copyright.

2. A court is not going to agree that passing the original sources through a machine means you own the results!

I mean, that's what it comes down to - as far as the courts are concerned, passing copyright stuff through a machine results in the output retaining the original copyright. Passing copyright material through a person is not so clear cut.

> If you find any code in Grit that is clearly line-for-line copied, please point it out and I will replace it.

Why is it everyone else’s job to figure out if you’re compliant with the license? That’s your responsibility.

Ignore the haters, this site is turning into MAGA
My dude it's literally the opposite in this case.
I'm terrified that this somehow seems acceptable to a large group of people.

I'm baffled that other IP holders (say those who own valuable pieces of proprietary software, or music, or movies, or even the LLMs themselves) don't think leopards will come eat their faces next. This erosion of IP has to stop, or anyone who does any intellectual work will be absolutely screwed. If that only meant FOSS people, I'd be worried that we'd just be thrown out with the bathwater – but surely this applies across the board!?

The people doing the intellectual work are usually not the primary beneficiaries of IP laws. In fact it often constrains them unnecessarily.
> The people doing the intellectual work are usually not the primary beneficiaries of IP laws. In fact it often constrains them unnecessarily.

In the sense that most people doing intellectual work do that work for someone else (say, a company) that you consider the primary beneficiary of IP law? Sure, fine – but this applies to almost any other type of work and the legal constructs that are in use there too, so it's not really a very useful distinction to make, even if technically correct.

Or do you mean something else?

Of course they are afraid of it, haven't you seen Dario being angry of Chinese companies paying for Claude access (tokens = test cases) and training their own model from those?
Well exactly!

I'm well aware of situations of potentially upending changes where the rich and powerful stand to gain, and the little guy's worries are ignored.

This, however, is clearly a potentially upending change where also lots of the rich and powerful – including those who control the very technology driving the change – have everything to lose. I'm surprised, to say it mildly, that nothing seems to be happening. Does Dario really believe that a strict ToS and stern words will keep his IP protected without appealing to the legal system? (I guess that is par for the course for the people who "solve" world problems with bunkers and armed guards…)

I might even be fine with the loss of IP if everyone lost it.
How does intellectual work happen (beyond doing it for leisure) in a world without IP?

In a utopian world of abundance where we could all be the independently wealthy nobles of the 18th and 19th century who did intellectual work for fun: great. In the world of today where people need to be compensated for their work: what happens?

Any way anyone who wants, wants, that doesn't rely on legal control of information. There are infinite uncopyrightable things that never the less get done and make people a lot of livings.

I only said "might" and the point was obviously not the immediate surface idea but to point out how the tool of IP is not applied to everyone's benefit equally, but used only against some and only for some, with a side of "You know, fuck it, if they insist on making it worse, it becomes less crazy to consider just burning the house down".

But What are you so afraid of that you react only to the hypothetical as though it were the worst danger?

We'd actually manage to get yoked and abused by the same people no matter what the rules were, don't worry.

How is this an answer to the question? Good thing you aren't in government. Who would so much as bother to write a book?
I'd add: why bother... except for fun. We shouldn't discount enjoyment as motivation. But I otherwise agree wholeheartedly with you. We need intellectual work that goes beyond "just for fun" as long as most people have to work to live.
It’s all a bit voodoo to me but wouldn't the entire original source code be in the training data also?
Yes, and LLMs have been shown to store and be able to output their training data, so this is at best very sketchy
Thank you for the answer, I was curious if they could produce it all too from the weights.