| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brown 1195 days ago
	For anyone who wants to slow the development of AI, copyright is the soft underbelly to go after.

1 comments

dvt 1195 days ago

Are you seriously arguing that stealing code is okay in the name of "AI development"?

link

yamoriyamori 1195 days ago

I think their comment was to the contrary, that the copyright/legal implications of 'stolen' code could seriously hobble the wider development, proliferation, adoption, and commercialization of AI software.

link

dvt 1195 days ago

Maybe I misunderstood, but the comment seemed to dismiss copyright issues as a cheap way to kill AI ("soft underbelly"). I think stealing code is a pretty serious deal and the onus is on AI software companies to make sure they aren't doing it; it's not "slowing the development of AI" to keep them accountable.

link

sebzim4500 1195 days ago

Soft underbelly isn't dismissive, they're just saying that it is the natural target to aim for.

link

hughesjj 1195 days ago

"soft underbelly" is synonymous with "weak point" or "Achilles heel", it's in no way dismissive. If anything, it's the opposite of dismissive.

link

codexb 1195 days ago

Are you seriously arguing that using short snippets open source code to inspire similar, yet not exactly the same, original code is "stealing code"? Human developers do that all day long. And just because a piece of code exists in a GPL project doesn't mean it originated there. Every algorithm or sort function likely originated in a more permissively licensed project before it got included in a GPL project.

link

jupp0r 1195 days ago

What happens if I (a human) read GPL code and then reuse the knowledge gained from it in my own commercial projects? It's not as clear cut as you make it sound.

link

challengedchip 1195 days ago

It could be as clear-cut as you've just made it: "a human". An LLM is not a human.

You could get into the semantics of "learning" - does JPEG encoding count as the computer "learning" how to reproduce the original image? But trying to create some metric for why LLMs "learn" and JPEG doesn't "learn" on the basis of the algorithms is a philosophical endeavor. Copyright is more about practicality - about realized externalities - than it is about philosophy. That's why selling cars and selling guns are regulated differently, despite the fact that you could reduce both to "metal mechanical machines that kill" by rhetorical argument.

Even from a strictly legal perspective, it actually is fairly clear-cut. The answer to "what if I (a human) read GPL code and then reuse the knowledge gained from it..." comes down to a few straightforward properties of the license. GPL doesn't cover "reduced to practice" as many corporate contracts do, so terms covering "the knowledge gained" are lenient. GPL covers "verbatim" copies which is what LLMs are doing, that's as clear cut as it gets. Inb4: "So what if I add a few spaces here and there?" - well, GPL also covers "a work based on"; this is where I (who am not a lawyer) can't speak confidently, but surely there are legal differences between "based on" and "reduced to practice", considering that both are very common occurrences in contracts, so there actually would be a lot of precedent.

link

jupp0r 1195 days ago

I agree with you that verbatim copies are obviously covered by copyright. What if LLMS reproduce code with changed variable and function names (which would be a great improvement to `cs_gaxpy` in the original article)? What if just the general structure of an algorithm is used? What if the LLM translates the C algorithm from the original article into Rust? This discussion is only scratching the surface.

link

VWWHFSfQ 1195 days ago

Copyright. Copyright. That is the issue. If you reproduce the code verbatim then you are in violation. This is what the AI is doing.

Just learning from the GPL code to make yourself smarter is not the problem.

link

snacktaster 1195 days ago

It's going to be an uphill battle just to get people to even understand what the problems are. And this is even a technical forum. Now imagine trying to explain these nuances to a judge or jury.

link

jacquesm 1195 days ago

It's not so much an ability to understand as it is a desire to not understand in order to be able to ignore the rightsholders' licensing terms.

Plenty of tech companies exist by putting a thin layer on top of the hard work of others and if those others can be ignored then that's what they'll do.

link

codexb 1195 days ago

The example given in the article isn't verbatim.

link

HideousKojima 1195 days ago

I... don't see how you read what he said that way at all?

link

jakelazaroff 1195 days ago

If you read a negative connotation into "slow the development of AI", that's what you get. It's how I'd interpret that comment, too.

link

lmarcos 1195 days ago

Is not ok, but Microsoft couldn't care less (because they are not going to get fined).

link

blibble 1195 days ago

yes, because they don't indemnify their customers

anyone sensible should stay the hell away from copilot until the fair use question is settled

link

bastardoperator 1195 days ago

Looks like they do.

https://github.com/customer-terms/github-copilot-product-spe...

4. Defense of Third Party Claims. If your Agreement provides for the defense of third party claims, that provision will apply to your use of GitHub Copilot. Notwithstanding any other language in your Agreement, any GitHub defense obligations related to your use of GitHub Copilot do not apply if (i) the claim is based on Code that differs from a Suggestion provided by GitHub Copilot, or (ii) you have not enabled all filtering features available in GitHub Copilot.

link

blibble 1195 days ago

interesting

> If your Agreement provides for the defense of third party claims

do any of them?

it also states:

> You retain all responsibility for Your Code, including Suggestions you include in Your Code or reference to develop Your Code. It is entirely your decision whether to use Suggestions generated by GitHub Copilot. If you use Suggestions, GitHub strongly recommends that you have reasonable policies and practices in place designed to prevent the use of a Suggestion in a way that may violate the rights of others. This includes, but is not limited to, using all filtering features available in GitHub Copilot.

(contra proferentem would apply though)

link

bastardoperator 1195 days ago

I think it's pretty clear. If you're not filtering, you're liable. If you are and something transpires, they'll fight your legal battle for you which is probably better than any monetary indemnity clause. I assume this is for enterprise users where it actually matters.

link

noselasd 1195 days ago

The comment is arguing quite the opposite.

link

IshKebab 1195 days ago

Training AI on code is clearly not the same as stealing it.

link