| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by burroisolator 359 days ago
	AI only got big, especially for coding, because they were able to train on a massive corpus of open source code. I don't think it is a coincidence.

2 comments

hardwaresofton 359 days ago

Another funny possibly sad coincidence is that the licenses that made open source what it is will probably be absolutely useless going forward, because as recent precedent has shown, companies can train on what they have legally gained access to.

On the other hand, AGPL continues to be the future of F/OSS.

link

haiku2077 359 days ago

MIT is also still useful; it lets me release code where I don't really care what other people do with it as long as they don't sue me (an actual possibility in some countries)

link

LtWorf 359 days ago

Which countries would these be?

link

haiku2077 359 days ago

The US, for one. You can sue nearly anyone for nearly anything, even something you obviously won't win in court, as long as you find a lawyer willing to do it; you don't need any actual legal standing to waste the target's time and money.

Even the most unscrupulous lawyer is going to look at the MIT license, realize the target can defend it for a trivial amount of money (a single form letter from their lawyer) and move on.

link

Jensson 359 days ago

You can sue for damages if they have malware in the code, there is no license that protects you from distributing harmful products even if you do it for free.

link

haiku2077 358 days ago

If I commit fraud, sure. But the code I release is extremely honest about what it does :)

link

thih9 358 days ago

There are other ways to litigate that the malicious/greedy can use, where MIT offers no protection; e.g. patent trolling.

link

tom_m 358 days ago

And illegally too. Anthropic didn't pay for those books they used.

It's too late at this point. The damage is done. These companies trained on illegally obtained data and they will never be held accountable for that. The training is done and they got what they needed. So even if they can't train on it in the future, it doesn't matter. They already have those base models.

link

ddq 358 days ago

Then punitive measures are in order. Add it to the pile of illegal, immoral, and unethical behavior of the feudal tech oligarchs already long overdue for justice. The harm they have done and are doing to humanity should not remain unpunished.

link

malfist 358 days ago

Legally or illegally gained access too. Lest we forget Meta pirating books

link

coffeefirst 358 days ago

And the legality of this may vary by jurisdiction. There’s a nonzero chance that they pay a few million in the US for stealing books but the EU or Canada decide the training itself was illegal.

link

andy99 358 days ago

Then the EU and canada just won't have any sovereign LLMs. They'll have to decide if they'd rather prop up some artificial monopoly or support (by not actively undermining) innovation.

link

foobiekr 358 days ago

It’s not going to happen. The EU is desperate to stop being in fourth place in technology and will do absolutely nothing to put a damper on this. It’s their only hope to get out of the rut.

link

EGreg 358 days ago

Explain how AGPL would prevent AI from being trained on it or AI-generated code competing with it. I have used AGPL for a decade and still not sure.

link

hardwaresofton 358 days ago

It wouldn't -- AGPL code that is picked up would also just get "fair used" into new software.

That said, AGPL as a trend was a huge closing of the spigot of free F/OSS code for companies to use and not contribute back to.

link

EGreg 358 days ago

Yes, I hope it was a trend. People were judging me when I first started using it over 10 years ago.

link

jorvi 359 days ago

Yup. The book torrenting case is pretty nuts.

If I can reproduce the entirety of most books off the top of my head and sell that to people as a service, it's a copyright violation. If AI does it, it's fair use.

Pants-on-head idiotic judge.

link

derektank 358 days ago

>If I can reproduce the entirety of most books off the top of my head and sell that to people as a service, it's a copyright violation. If AI does it, it's fair use.

Assuming you're referring to Bartz v. Anthropic, that is explicitly not what the ruling said, in fact it's almost the inverse. The judge said that output from an AI model which is a straight up reproduction of copyrighted material would likely be an explicit violation of copyright. This is on page 12/32 of the judgement[1].

But the vast majority of output from an LLM like Claude is not a word for word reproduction; it's a transformative use of the original work. In fact, the authors bringing the suit didn't even claim that it had reproduced their work. From page 7, "Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service." That's because Anthropic is already explicitly filtering out results that might contain copyrighted material. (I've run into this myself while trying to translate foreign language song lyrics to English. Claude will simply refuse to do this)[2]

[1] https://www.courtlistener.com/docket/69058235/231/bartz-v-an...

[2] https://claude.ai/share/d0586248-8d00-4d50-8e45-f9c5ef09ec81

link

gosub100 358 days ago

They should still have to pay damages for possessing the copyrighted material. That's possession, which courts have found is copyright violation. Remember all the 12 year olds who got their parents sued back in the 2000s? They had unauthorized copies.

link

derektank 358 days ago

I don't know what exactly you're referring to here. The model itself is not a copy, you can't find the copyrighted material in the weights. Even if you could, you're allowed under existing case law to make copies of a work for personal use if the copies have a different character and as long as you don't yourself share the new copies. Take the Sony Betamax case, which found that it was legal and a transformative use of copyrighted material to create a copy of a publicly aired broadcast onto a recording medium like VHS and Betamax for the purposes of time-shifting one's consumption.

Now, Anthropic was found to have pirated copyrighted work when they downloaded and trained Claude on the LibGen library. And they will likely pay substantial damages for this. So on those grounds, they're as screwed as the 12 year olds and their parents. The trial to determine damages hasn't happened yet though.

link

gosub100 358 days ago

> The model itself is not a copy,

Agreed

> the Sony Betamax case, which found that it was legal and a transformative use of copyrighted material to create a copy of a publicly aired broadcast

Good thing libgen is not publicly aired in broadcast format.

> So on those grounds, they're as screwed as the 12 year olds and their parents.

Except they have deep enough pockets to actually pay the damages for each count of infringement. That's the blood most of us want to see shed.

You cannot have trained the model without possession of copyrighted works. Which we seem to be in agreement on.

link

hardwaresofton 359 days ago

This was immediately my reaction as well, but I'm not a judge so what do I know. In my own mind I mark it as a "spice must flow" moment -- it will seem inevitable in retrospect but my simple (almost surely incorrect) take is that there just wasn't a way this was going to stop AI's progress. AI as a trend has incredible plot armor at this point in time.

Is the hinge that the tools can recall a huge portion (not perfectly of course) but usually don't? What seems even more straight forward is the substitute good idea, it seems reasonable to assume people will buy less copies of book X when they start generating books heavily inspired by book X.

But, this is probably just a case of a layman wandering into a complex topic, maybe it's the case that AI has just nestled into the absolute perfect spot in current copyright law, just like other things that seem like they should be illegal now but aren't.

link

fragmede 359 days ago

I didn't see the part of the trial where they got the "entirety of most books" out of Llama. What did you see that I didn't?

link

redman25 358 days ago

Sad to say but it would have put US companies at a major disadvantage if they were not allowed to.

link

tim333 358 days ago

I'm not sure that's true. I've never heard of a human being done for copyright for reciting a book passage.

I daresay the difference with AI is that pretty much no human can do that well enough to harm the copyright holder, whereas AI can churn it out.

link

tom_m 358 days ago

Yea, that dipshit judge just opened the flood gates for more problems. The problem is they don't understand how this stuff works and they're in the position of having to make a judgement on it. They're completely unprepared to do so.

Now there's precedent for future cases where theft of code or any other work of art can be considered fair use.

link

sneak 358 days ago

The AGPL is a nonfree license that is virtually impossible to comply with.

It’s an EULA trying to pretend it’s a license. You can’t have it both ways.

link

hardwaresofton 358 days ago

This is a strong claim, given it is listed as a free, copyleft license:

https://www.gnu.org/licenses/agpl-3.0.en.html

Could you expand on why you think it's nonfree? Also, it's not that hard to comply with either...

link

px43 358 days ago

For some people "free" means "autonomy", and copyleft licences do a lot to restrict autonomy.

link

jrochkind1 358 days ago

So interestingly, free meant autonomy for Stallman and the original proponents of "copyleft" style licenses too. But autonomy for end-users, not developers. But Stallman et al believed the copyleft style licenses maximized autonomy for end-users, rightly or wrongly, that was the intent.

link

hardwaresofton 358 days ago

Yeah if it's a problem of definition, then I definitely agree that it could not match there, it certainly isn't a do anything you want license.

link

waffletower 358 days ago

"Free" decidedly means autonomy; "I have been freed from prison". Use of the word "free" in many OSS licenses is a jarring euphemism.

link

tedheath123 358 days ago

cf. https://en.wikipedia.org/wiki/Two_Concepts_of_Liberty

link

sneak 358 days ago

marcan does a much more detailed job than I do:

https://news.ycombinator.com/item?id=30495647

https://news.ycombinator.com/item?id=30044019

GNU/FSF are the anticapitalist zealots that are pushing this EULA. Just because they approve of it doesn’t make it free software. They are confused.

link

hardwaresofton 358 days ago

I read through and I think that the analysis suffers from the fact that in the case when the modifier is the user it's fine.

Free software refers to user freedoms, not developer freedoms.

I don't think the below is right:

> > Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software.

> Let's break it down:

> > If you modify the Program

> That is if you are a developer making changes to the source code (or binary, but let's ignore that option)

> > your modified version

> The modified source code you have created

> > must prominently offer all users interacting with it remotely through a computer network

> Must include the mandatory feature of offering all users interacting with it through a computer network (computer network is left undefined and subject to wide interpretation)

I read the AGPL to mean if you modify the program then the users of the program (remotely, through a computer network) must be able to access the source code.

It has yet to be tested, but that seems like the common sense reading for me (which matters, because judges do apply judgement). It just seems like they are trying too hard to do a legal gotcha. I'm not a lawyer so I can't speak to that, but I certainly don't read it the same way.

I don't agree with this interpretation of every-change-is-a-violation either:

> Step 1: Clone the GitHub repo

> Step 2: Make a change to the code - oops, license violation! Clause 13! I need to change the source code offer first!

> Step 1.5: Change the source code offer to point to your repo

This example seems incorrect -- modifying the code does not automatically make people interact with the program over a network...

"free software" was defined by the GNU/FSF... so I generally default to their definitions. I don't think the license falls afoul of their stated definitions.

That said, they're certainly anti-capitalist zealots, that's kind of their thing. I don't agree with that, but that's besides the point.

link

marcosdumay 358 days ago

It's not really "virtually impossible to comply with". It's very restrictive, yes, but not hard to comply if you want to.

And yes, it is an EULA pretending to be a license. I'd put good odds on it being illegal in my country, and it may even be illegal on the US. But it's well aligned with the goals of GNU.

link

surfingdino 358 days ago

And if they AI companies don't like the license, they will ignore it or pay to be given a waver. Long may they rot in hell for doing that.

link

yard2010 358 days ago

Hell is, by design, a consequence for poor people. (People could literally pay the church to not go to hell[0]). Rich people have no consequences whatsoever, let alone poor people consequences.

[0] https://www.cambridge.org/core/books/abs/preaching-the-crusa...

link

GTP 358 days ago

Not "by design", as historically the hell came first. It was only much later that they catholic church started talking about the purgatory and the possibility of reducing your punishment by paying money.

link

smokel 358 days ago

The people running AI companies have figured out that there is no such thing as hell. We have to come up with new reasons for people to behave in a friendly way.

link

fennecbutt 358 days ago

We already have such reasons. Besides, all religious "kindness" was never kindness without strings attached, even though they'd like you to think that was the case.

link

_heimdall 358 days ago

The people running AI companies aren't magic, they can't be certain about what comes after death.

link

pizzafeelsright 358 days ago

If I can have AI retype all code per my desire how exactly is source code special?

I like open source. I also don't think that is where the magic is anymore.

It was scale for 20 years.

Now it is speed.

link

bravesoul2 358 days ago

Open source may be necessary but it is not sufficient. You also needed the compute power and architecture discoveries and the realisation that lots of data > clever feature mapping for this kind of work.

A world without open source may have given birth to 2020s AI but probably at a slower pace.

link