Hacker News new | ask | show | jobs
by hardwaresofton 359 days ago
Another funny possibly sad coincidence is that the licenses that made open source what it is will probably be absolutely useless going forward, because as recent precedent has shown, companies can train on what they have legally gained access to.

On the other hand, AGPL continues to be the future of F/OSS.

8 comments

MIT is also still useful; it lets me release code where I don't really care what other people do with it as long as they don't sue me (an actual possibility in some countries)
Which countries would these be?
The US, for one. You can sue nearly anyone for nearly anything, even something you obviously won't win in court, as long as you find a lawyer willing to do it; you don't need any actual legal standing to waste the target's time and money.

Even the most unscrupulous lawyer is going to look at the MIT license, realize the target can defend it for a trivial amount of money (a single form letter from their lawyer) and move on.

You can sue for damages if they have malware in the code, there is no license that protects you from distributing harmful products even if you do it for free.
If I commit fraud, sure. But the code I release is extremely honest about what it does :)
There are other ways to litigate that the malicious/greedy can use, where MIT offers no protection; e.g. patent trolling.
And illegally too. Anthropic didn't pay for those books they used.

It's too late at this point. The damage is done. These companies trained on illegally obtained data and they will never be held accountable for that. The training is done and they got what they needed. So even if they can't train on it in the future, it doesn't matter. They already have those base models.

Then punitive measures are in order. Add it to the pile of illegal, immoral, and unethical behavior of the feudal tech oligarchs already long overdue for justice. The harm they have done and are doing to humanity should not remain unpunished.
Legally or illegally gained access too. Lest we forget Meta pirating books
And the legality of this may vary by jurisdiction. There’s a nonzero chance that they pay a few million in the US for stealing books but the EU or Canada decide the training itself was illegal.
Then the EU and canada just won't have any sovereign LLMs. They'll have to decide if they'd rather prop up some artificial monopoly or support (by not actively undermining) innovation.
It’s not going to happen. The EU is desperate to stop being in fourth place in technology and will do absolutely nothing to put a damper on this. It’s their only hope to get out of the rut.
Explain how AGPL would prevent AI from being trained on it or AI-generated code competing with it. I have used AGPL for a decade and still not sure.
It wouldn't -- AGPL code that is picked up would also just get "fair used" into new software.

That said, AGPL as a trend was a huge closing of the spigot of free F/OSS code for companies to use and not contribute back to.

Yes, I hope it was a trend. People were judging me when I first started using it over 10 years ago.
Yup. The book torrenting case is pretty nuts.

If I can reproduce the entirety of most books off the top of my head and sell that to people as a service, it's a copyright violation. If AI does it, it's fair use.

Pants-on-head idiotic judge.

>If I can reproduce the entirety of most books off the top of my head and sell that to people as a service, it's a copyright violation. If AI does it, it's fair use.

Assuming you're referring to Bartz v. Anthropic, that is explicitly not what the ruling said, in fact it's almost the inverse. The judge said that output from an AI model which is a straight up reproduction of copyrighted material would likely be an explicit violation of copyright. This is on page 12/32 of the judgement[1].

But the vast majority of output from an LLM like Claude is not a word for word reproduction; it's a transformative use of the original work. In fact, the authors bringing the suit didn't even claim that it had reproduced their work. From page 7, "Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service." That's because Anthropic is already explicitly filtering out results that might contain copyrighted material. (I've run into this myself while trying to translate foreign language song lyrics to English. Claude will simply refuse to do this)[2]

[1] https://www.courtlistener.com/docket/69058235/231/bartz-v-an...

[2] https://claude.ai/share/d0586248-8d00-4d50-8e45-f9c5ef09ec81

They should still have to pay damages for possessing the copyrighted material. That's possession, which courts have found is copyright violation. Remember all the 12 year olds who got their parents sued back in the 2000s? They had unauthorized copies.
I don't know what exactly you're referring to here. The model itself is not a copy, you can't find the copyrighted material in the weights. Even if you could, you're allowed under existing case law to make copies of a work for personal use if the copies have a different character and as long as you don't yourself share the new copies. Take the Sony Betamax case, which found that it was legal and a transformative use of copyrighted material to create a copy of a publicly aired broadcast onto a recording medium like VHS and Betamax for the purposes of time-shifting one's consumption.

Now, Anthropic was found to have pirated copyrighted work when they downloaded and trained Claude on the LibGen library. And they will likely pay substantial damages for this. So on those grounds, they're as screwed as the 12 year olds and their parents. The trial to determine damages hasn't happened yet though.

> The model itself is not a copy,

Agreed

> the Sony Betamax case, which found that it was legal and a transformative use of copyrighted material to create a copy of a publicly aired broadcast

Good thing libgen is not publicly aired in broadcast format.

> So on those grounds, they're as screwed as the 12 year olds and their parents.

Except they have deep enough pockets to actually pay the damages for each count of infringement. That's the blood most of us want to see shed.

You cannot have trained the model without possession of copyrighted works. Which we seem to be in agreement on.

This was immediately my reaction as well, but I'm not a judge so what do I know. In my own mind I mark it as a "spice must flow" moment -- it will seem inevitable in retrospect but my simple (almost surely incorrect) take is that there just wasn't a way this was going to stop AI's progress. AI as a trend has incredible plot armor at this point in time.

Is the hinge that the tools can recall a huge portion (not perfectly of course) but usually don't? What seems even more straight forward is the substitute good idea, it seems reasonable to assume people will buy less copies of book X when they start generating books heavily inspired by book X.

But, this is probably just a case of a layman wandering into a complex topic, maybe it's the case that AI has just nestled into the absolute perfect spot in current copyright law, just like other things that seem like they should be illegal now but aren't.

I didn't see the part of the trial where they got the "entirety of most books" out of Llama. What did you see that I didn't?
Sad to say but it would have put US companies at a major disadvantage if they were not allowed to.
I'm not sure that's true. I've never heard of a human being done for copyright for reciting a book passage.

I daresay the difference with AI is that pretty much no human can do that well enough to harm the copyright holder, whereas AI can churn it out.

Yea, that dipshit judge just opened the flood gates for more problems. The problem is they don't understand how this stuff works and they're in the position of having to make a judgement on it. They're completely unprepared to do so.

Now there's precedent for future cases where theft of code or any other work of art can be considered fair use.

The AGPL is a nonfree license that is virtually impossible to comply with.

It’s an EULA trying to pretend it’s a license. You can’t have it both ways.

This is a strong claim, given it is listed as a free, copyleft license:

https://www.gnu.org/licenses/agpl-3.0.en.html

Could you expand on why you think it's nonfree? Also, it's not that hard to comply with either...

For some people "free" means "autonomy", and copyleft licences do a lot to restrict autonomy.
So interestingly, free meant autonomy for Stallman and the original proponents of "copyleft" style licenses too. But autonomy for end-users, not developers. But Stallman et al believed the copyleft style licenses maximized autonomy for end-users, rightly or wrongly, that was the intent.
Yeah if it's a problem of definition, then I definitely agree that it could not match there, it certainly isn't a do anything you want license.
"Free" decidedly means autonomy; "I have been freed from prison". Use of the word "free" in many OSS licenses is a jarring euphemism.
marcan does a much more detailed job than I do:

https://news.ycombinator.com/item?id=30495647

https://news.ycombinator.com/item?id=30044019

GNU/FSF are the anticapitalist zealots that are pushing this EULA. Just because they approve of it doesn’t make it free software. They are confused.

I read through and I think that the analysis suffers from the fact that in the case when the modifier is the user it's fine.

Free software refers to user freedoms, not developer freedoms.

I don't think the below is right:

> > Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software.

>

> Let's break it down:

>

> > If you modify the Program

>

> That is if you are a developer making changes to the source code (or binary, but let's ignore that option)

>

> > your modified version

>

> The modified source code you have created

>

> > must prominently offer all users interacting with it remotely through a computer network

>

> Must include the mandatory feature of offering all users interacting with it through a computer network (computer network is left undefined and subject to wide interpretation)

I read the AGPL to mean if you modify the program then the users of the program (remotely, through a computer network) must be able to access the source code.

It has yet to be tested, but that seems like the common sense reading for me (which matters, because judges do apply judgement). It just seems like they are trying too hard to do a legal gotcha. I'm not a lawyer so I can't speak to that, but I certainly don't read it the same way.

I don't agree with this interpretation of every-change-is-a-violation either:

> Step 1: Clone the GitHub repo

>

> Step 2: Make a change to the code - oops, license violation! Clause 13! I need to change the source code offer first!

>

> Step 1.5: Change the source code offer to point to your repo

This example seems incorrect -- modifying the code does not automatically make people interact with the program over a network...

"free software" was defined by the GNU/FSF... so I generally default to their definitions. I don't think the license falls afoul of their stated definitions.

That said, they're certainly anti-capitalist zealots, that's kind of their thing. I don't agree with that, but that's besides the point.

It's not really "virtually impossible to comply with". It's very restrictive, yes, but not hard to comply if you want to.

And yes, it is an EULA pretending to be a license. I'd put good odds on it being illegal in my country, and it may even be illegal on the US. But it's well aligned with the goals of GNU.

And if they AI companies don't like the license, they will ignore it or pay to be given a waver. Long may they rot in hell for doing that.
Hell is, by design, a consequence for poor people. (People could literally pay the church to not go to hell[0]). Rich people have no consequences whatsoever, let alone poor people consequences.

[0] https://www.cambridge.org/core/books/abs/preaching-the-crusa...

Not "by design", as historically the hell came first. It was only much later that they catholic church started talking about the purgatory and the possibility of reducing your punishment by paying money.
The people running AI companies have figured out that there is no such thing as hell. We have to come up with new reasons for people to behave in a friendly way.
We already have such reasons. Besides, all religious "kindness" was never kindness without strings attached, even though they'd like you to think that was the case.
The people running AI companies aren't magic, they can't be certain about what comes after death.
If I can have AI retype all code per my desire how exactly is source code special?

I like open source. I also don't think that is where the magic is anymore.

It was scale for 20 years.

Now it is speed.