| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jchw 420 days ago

> I asked first and I don't want to influence your response. So, go ahead. You first.

It's simple: I'm not dodging the question, it's just that I don't know. It's complicated. It's easy to punch someone in the face and say "I have harmed this person" but things go into the weeds quickly. Like, can you harm someone through inaction? It's a surprisingly deep philosophical question and I am not a philosopher. I don't think determining exactly what harm is to be relevant in this particular case, anyways, but any definition I could come up with would probably have holes in it and lead to a large debate that I'd argue isn't actually relevant to the point(s) being made anyways.

> If your only answer is that plagiarism is bad then I agree with that (in certain settings, such as education), but it's clearly no longer considered to be illegal (if it ever was?) or immoral. Just look at all the bigtech LLMs doing so while raising billions without getting into legal trouble. So apparently society has recently decided that this is fine.

Say we really did crack the code on how human learning works and distilled it into an algorithm. If you were able to use this algorithm to produce a representation of learned skills and knowledge, e.g. something lossy enough to be considered legally distinct rather than just a compressed form of the training data, then surely this would not be considered a derivative work of the copyright material used to train it. I think most people would agree with this. (Note the obvious caveats, e.g. if your weights do contain obvious artifacts of direct memorization then it would still be a legal problem.)

Clearly we haven't done that yet, but we did do something that sits between "lossless compression" and "human learning". The courts have the unenviable job of trying to figure out where to draw the line when we still don't really understand what's going on.

I don't really like the heist that occurred with machine learning, but I also lack a satisfactory answer on what exactly it is they did wrong (except for the obvious, e.g. committing massive amounts of piracy and DDoS'ing the entire Internet for the sake of training data.) I don't think anybody could have foresaw what would happened with machine learning decades ago to be able to make laws that would adequately cover it, and tech companies always move way too fast for regulators to keep up.

However, I don't believe that this means that all plagiarism is simply okay, either legally or morally. I just think we lack an adequate legal framework to represent our moral quandaries with big tech machine learning operations, as the traditional notion of plagiarism doesn't cover the complexities of model weights or model outputs. I also don't think that the current legal frameworks will last forever; it's a golden era for ML companies, but assuming they haven't and aren't cracking the code on artificial cognition (I strongly believe they're not near it atm) I believe regulations will eventually catch up some time after the hype has died down.

1 comments

jillyboel 420 days ago

Alright, my point is that any harm done here is significantly less than what the bigtech LLMs are doing. If plagiarizing code is bad then so is both building & using LLMs. If building & using LLMs is fine, then so is plagiarizing code.

In this case there's a non-commerical open source project that ignored some other project's licenses. This isn't great, but it doesn't affect me, a third party, in the slightest. I have no reason to be upset about this. It doesn't really affect the other projects either, nor does it negatively affect our society. If anything it adds to our society by giving something people are clearly interested in having.

In the case of RTEMS the only thing they're missing out on is attribution. Nintendo isn't missing out on anything at all, people will still be buying their hardware to run this software.

So my argument is that any harm that may have been done is insignificant at best. Hardly worth getting upset about, especially as a third party.

As for the legal argument, it's hypocritical at best. If someone wants to condemn what happened here they should first go after the big boys who are making billions by doing the same thing on a massive scale.

> If you were able to use this algorithm to produce a representation of learned skills and knowledge, e.g. something lossy enough to be considered legally distinct rather than just a compressed form of the training data, then surely this would not be considered a derivative work of the copyright material used to train it. I think most people would agree with this.

If it's okay for an algorithm to do then it's okay for a human to do. So in that case copyright would be dead since the conclusion is you (or a machine learning algorithm) are allowed to ingest some content, then produce similar content.

A simple example is using an LLM to draw an image of some disney characters. If we say the LLM is allowed to do this because it learned to do so, which we aren't considering to be plagiarism, then why are human artists being sued by disney for doing the same?

Or in this case, let's say the original authors used an advanced LLM to assist their coding. The LLM once happened to ingest Nintendo's binary blobs during training and was advanced enough to learn from them. It uses this knowledge to produce code that can interface with the hardware which just so happens to look like the original code because that's just how you do it. Is it suddenly not plagiarism anymore? Did it become morally okay because the LLM laundered the code? Is this any different from LLMs ingesting all of github and becoming coding assistants? Why are we okay with that, but not when a human does it?

I know that in the end the legal answer is that if you have enough money you can do whatever you want, but this doesn't answer the moral questions.

link

jchw 420 days ago

> Alright, my point is that any harm done here is significantly less than what the bigtech LLMs are doing. If plagiarizing code is bad then so is both building & using LLMs. If building & using LLMs is fine, then so is plagiarizing code.

This falls under the "two wrongs don't make a right" adage, I'd argue. (To clarify... I agree, at least insofar as LLM training is plagiarism.)

> In this case there's a non-commerical [sic] open source project that ignored some other project's licenses. This isn't great, but it doesn't affect me, a third party, in the slightest. I have no reason to be upset about this.

Personally, I do sometimes get upset about things that don't directly affect me, as the result of empathy, sympathy, and having principles. I think if you really think about you'd agree.

> It doesn't really affect the other projects either, nor does it negatively affect our society. If anything it adds to our society by giving something people are clearly interested in having.

Calculating the effect of one person committing plagiarism is impossible. Part of the reason it's taboo is because I think we all agree the world is a better place when people are honest and give credit where credit is due, even when the threat of a lawsuit is not looming. Even if you are going to potentially violate a copyright license, you may as well be forthcoming about it IMO. And I'm not asking for perfection; we all make mistakes, after all. It's really about how you handle things once a mistake is brought to your attention.

As far as this goes, the tricky part is that right now, anyone distributing software that includes libogc, e.g. almost all GameCube and Wii homebrew, is potentially guilty of unauthorized distribution of copyrighted materials. It is hard to quantify how severe this is, but if you are trying to follow the letter of the law closely, especially since you're likely already engaging in gray area activities like console hacking, you probably want to keep a strong distance from illicit activities. I strongly believe that courts will consider how strong your public commitment to not purposefully violating the law was if you wind up going to trial. Just look at how the Discord conversations wound up factoring into the Citra case. Now that everyone is aware, the ball is suddenly in hundreds of people's courts to figure out; presumably most of them will just do nothing and ignore it, but it's difficult to really quantify what damage is done here.

The homebrew scene has a strong reason to distance themselves from software piracy. When the homebrew scene itself is building on top of potential copyright infringement, that's not a good look. It looks an awful lot like hypocrisy.

> In the case of RTEMS the only thing they're missing out on is attribution.

This part needs a deeper investigation. RTEMS has been relicensing to BSD 2-clause for a while, but some of the older code might only have been available under a variant of the GPL. Software that includes libogc today can't possibly be adhering to the RTEMS license since they will be missing the proper copyright notice and disclaimer, so this will take time to resolve. Meanwhile the modified GPL variant is likely OK for most projects, but it might pose licensing issues for some.

> Nintendo isn't missing out on anything at all, people will still be buying their hardware to run this software.

Those statements are largely not related, and not even necessarily true as plenty of people run homebrew on emulators. In fact, in many cases, you'll wind up running homebrew more on emulator than the real machine just because it's easier to do. Some homebrew actually specifically supports emulators and will take advantage of running in an emulator.

While this seems kind of silly, it's actually a big argument in favor of the legitimacy of emulators, as rather than simply emulate the console, you can argue what they actually do is emulate a platform that is a compatible superset of the game console.

> So my argument is that any harm that may have been done is insignificant at best. Hardly worth getting upset about, especially as a third party.

> As for the legal argument, it's hypocritical at best. If someone wants to condemn what happened here they should first go after the big boys who are making billions by doing the same thing on a massive scale.

It is totally possible to condemn both things. However, if you're a member of the homebrew scene, there's a good chance that one of those problems is more personally relevant to you than the other.

> If it's okay for an algorithm to do then it's okay for a human to do. So in that case copyright would be dead since the conclusion is you (or a machine learning algorithm) are allowed to ingest some content, then produce similar content.

> A simple example is using an LLM to draw an image of some disney characters. If we say the LLM is allowed to do this because it learned to do so, which we aren't considering to be plagiarism, then why are human artists being sued by disney for doing the same?

I think this is an even bigger can of worms than the AI one. We actually don't have a lot of case law on the legality of fan art and fan works in general. Note though that legal bullying can be effective even when the plaintiffs have no real leg to stand on, so it's hard to really judge what it means when someone has to fold to legal threats.

Meanwhile, if you think you can get away with it, I'd actually implore you or anyone else daring enough to try selling a blatantly copyright-infringing Disney T-shirt using Stable Diffusion for the artwork. I strongly doubt this would hold up in court. (If it did, it would be very funny.)

> Or in this case, let's say the original authors used an advanced LLM to assist their coding. The LLM once happened to ingest Nintendo's binary blobs during training and was advanced enough to learn from them. It uses this knowledge to produce code that can interface with the hardware which just so happens to look like the original code because that's just how you do it. Is it suddenly not plagiarism anymore? Did it become morally okay because the LLM laundered the code? Is this any different from LLMs ingesting all of github and becoming coding assistants? Why are we okay with that, but not when a human does it?

Actually you just described some real world case law, at least in the United States. Recently, Google LLC v. Oracle America, Inc. established that copying code for the sake of interoperability can be considered fair use. Similarly, Atari Games Corp. v. Nintendo and Sega Enterprises Ltd. v. Accolade, Inc. helped establish this earlier, and you could argue Sony's Connectix and Bleem! lawsuits as well, as both Connectix and Bleem! used some degree of non-cleanroom reverse engineering.

Copying code for the sake of interoperability can be fair use, even for libogc, even if someone of the resulting code is necessarily structurally similar to the original code. However, e.g. just copying decompilations directly out of HexRays or Ghidra is unlikely to hold up. (Disclaimer; obviously, IANAL.)

Today's case-law regarding machine learning models mostly establishes that the model weights themselves are not inherently infringing because of the training data. I'd argue the implicit legality of model outputs is significantly less charted waters and that is exactly why some ML vendors are providing indemnity agreements: they want to reassure their customers that they will not be liable if the model's outputs are found to be infringing, because there absolutely is still risk that model outputs could be found as infringing. It is not blackletter law that anything that comes out of a model is necessarily free of copyright, trademark and patent infringement.

> I know that in the end the legal answer is that if you have enough money you can do whatever you want, but this doesn't answer the moral questions.

Sure, and just because someone does or doesn't choose to sue also doesn't mean something is morally good or bad.

link

jillyboel 420 days ago

> This falls under the "two wrongs don't make a right" adage, I'd argue. (To clarify... I agree, at least insofar as LLM training is plagiarism.)

The argument is that so far society at large seems to have decided that what bigtech has done with LLMs is not wrong. Everyone is happily using it, pretty much every company is touting their new "AI" features, and lawsuits haven't gained any traction. So if it's not wrong for an LLM, I'd argue it's not wrong for a human, either.

> Personally, I do sometimes get upset about things that don't directly affect me, as the result of empathy, sympathy, and having principles. I think if you really think about you'd agree.

True, in this case I feel sympathy for the poor developers who put time into making a free open source tool that brought many people joy now being harassed over something as insignificant as a license dispute. It's all just made up nonsense designed to protect the big boys who can afford the fancy lawyers anyway.

The rest of the post is mostly about the legal angle where I'm sure you're right, but the main take-away is that these people did not really do anything morally wrong. It's just because of legal bullying that they have to be careful. So my distaste is aimed at those who perform the legal bullying and those who enable it, not at their victims.

link

jchw 419 days ago

> The argument is that so far society at large seems to have decided that what bigtech has done with LLMs is not wrong. Everyone is happily using it, pretty much every company is touting their new "AI" features, and lawsuits haven't gained any traction. So if it's not wrong for an LLM, I'd argue it's not wrong for a human, either.

Right, because if one wrong thing is allowed, we should allow... Other wrong things.

That sounds a lot like two wrongs making a right.

> True, in this case I feel sympathy for the poor developers who put time into making a free open source tool that brought many people joy now being harassed over something as insignificant as a license dispute. It's all just made up nonsense designed to protect the big boys who can afford the fancy lawyers anyway.

Okay. Well I feel sympathy for the poor developers who put time into making a free open source tool that brought many people joy now having their work ripped off without credit because I guess it's okay if LLM training is legal for some reason.

I mean, honest to God, how much rationalizing are we going to go through here? It's okay because LLMs? It's okay because it's free so that means plagiarism is fine? Copyright licenses are "made up nonsense"?

Marcan's response is disproportionate, I never even denied this. Doesn't really have any bearing on whether or not this libogc issue is a problem, and it is still a problem.

link

jillyboel 418 days ago

> Right, because if one wrong thing is allowed, we should allow... Other wrong things.

Unless you think LLMs deserve more rights than actual flesh and blood humans, yes. Either that or get bigtech to stop what they're doing, but we both know that's not going to happen.

> Okay. Well I feel sympathy for the poor developers who put time into making a free open source tool that brought many people joy now having their work ripped off without credit because I guess it's okay if LLM training is legal for some reason.

If LLMs get to do it, so do humans. If humans don't get to do it, then LLMs don't either. Society has embraced LLMs and bigtech is not going to give up their new toy, so it has to become okay for humans, even if you think this is unfortunate.

> I mean, honest to God, how much rationalizing are we going to go through here?

The same amount as people go through to excuse legal bullying.

> It's okay because LLMs?

Again, humans deserve more rights than machines, not less. So apparently, yes.

> It's okay because it's free so that means plagiarism is fine?

Yeah. No one was hurt, not even financially. You can think it's distasteful, that's fine.

> Copyright licenses are "made up nonsense"?

Yup, only serves to make the rich richer anyway.

link

jchw 418 days ago

> Unless you think LLMs deserve more rights than actual flesh and blood humans, yes. Either that or get bigtech to stop what they're doing, but we both know that's not going to happen.

> If LLMs get to do it, so do humans. If humans don't get to do it, then LLMs don't either. Society has embraced LLMs and bigtech is not going to give up their new toy, so it has to become okay for humans, even if you think this is unfortunate.

> Again, humans deserve more rights than machines, not less. So apparently, yes.

Man, you are fucking obsessed with LLMs. This incident predates the existence of LLMs, has nothing to do with LLMs, and plagiarism and ML training are two completely different issues. And, you keep acting like I am saying I think what happened with LLMs is fine, which I have never said at any point. I didn't say that what happened and is happening with LLMs is fine, only that it is a completely different thing that bears no relation to this whatsoever. Nobody mentioned LLMs. It's not a thing here. Stop talking about fucking LLMs.

> Yup, only serves to make the rich richer anyway.

The GPL is a great example of a copyright license that is explicitly not designed to make the rich richer.

link