Hacker News new | ask | show | jobs
by gruez 514 days ago
>Have you considered that you have some traits that make you eligible to read books and access information freely in the country you live in*? Something about being a conscious human being enjoying human rights, perhaps?

Not a relevant factor when it comes to copyright law. Fair use (the law that's most applicable here) applies regardless if you're a student using incorporating news articles into your work, or google making thumbnails and displaying them on their search results.

1 comments

This is not a good analogy. Google does not display the contents to any significant degree (you have to visit the search result). And even then it was/is in legal trouble, in fact (in some countries like Australia* more than others).

Furthermore:

> Examples of fair use in United States copyright law include commentary, search engines, criticism, parody, news reporting, research, and scholarship.

I do not see “automated generation of derivative works of arbitrary nature” in it.

* https://www.bbc.com/news/world-australia-55760673.amp

>This is not a good analogy. Google does not display the contents to any significant degree (you have to visit the search result).

The point isn't that AI training is legal because it's like generating thumbnails. That is being argued in the courts right now. The point is that fair use exemptions isn't limited to "being a conscious human being enjoying human rights", as google generating thumnails and snippets using computers shows.

https://en.wikipedia.org/wiki/Perfect_10,_Inc._v._Amazon.com....

> Examples of fair use in United States copyright law include commentary, search engines, criticism, parody, news reporting, research, and scholarship.

Those are examples, not an exhaustive list. It's not even something that Judges are supposed to compare against when deciding whether something is fair use or not, see: https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors

> The point is that fair use exemptions isn't limited to "being a conscious human being enjoying human rights"

Sure. However, my point is that this is not fair use*, so other principles need to be applied. Whether legal systems in various countries find that fair use applies here or not, I agree we are yet to see.

* At least in cases where it’s an LLM operated at scale for profit (which I suppose would not hold for Meta’s models if they were truly open, but that’s not the case if they require obtaining a license in some conditions).

>Sure. However, my point is that this is not fair use (at least in cases where it’s an LLM operated for profit), so other principles need to be applied.

This isn't a complete argument. Most of AI companies' argument relies on the fact that AI models are "transformative". That's a plausible claim, and as Perfect 10 v. Google, and Authors Guild, Inc. v. Google, Inc. has shown, being a for-profit company is hardly a disqualification from getting fair protection.

“Transformative” is always a grey area. If my service just returns you a book you requested, but in upper case, then it was transformed.

But sure, the “transformative” argument is the one that could apply (and even I believe Google used it to argue its case), if it can be shown that an LLM can not verbatim reproduce a given work (which, incidentally, is something that you, a warm-blooded fleshy human with agency who has the freedom to read books, cannot do, but LLMs were shown to do).

That said, relevant laws existed before LLMs, and may are outdated. If the goal is to balance reasonable uses while protecting original output of authors that ultimately drives innovation and creativity, I am not sure if the preexisting laws are continuing to fulfil their function, but that’s my opinion.

>But sure, the “transformative” argument is the one that could apply (and even I believe Google used it to argue its case), if it can be shown that an LLM can not verbatim reproduce a given work.

You have to try pretty hard to get LLMs to reproduce a work verbatim, especially any lengthy passages that aren't famous (and thus re-quoted on the internet a bazillion times). Moreover just because LLMs can reproduce a work verbatim if you try hard enough doesn't mean it's not transformative. Google search snippets and google book search has been ruled "transformative" by the courts, but if you tried hard enough you can use them to extract the entire work.

>That said, relevant laws existed before LLMs, and may are outdater. If the goal is to balance reasonable uses while protecting original output of authors that ultimately drives innovation and creativity, I am not sure if the preexisting laws are continuing to fulfil their function, but that’s my opinion.

AFAIK the era of mining the public internet or published works for AI training data is over, or at least coming to an end. Everything that could be mined, has already been mined, and besides, the internet is getting increasingly polluted by AI output. Private training data is where it's at now, whether it's sourcing document troves from companies (eg. emails, documentation, source code, etc.), or paying "AI annotators" to produce training data for you. If the argument is that human authors should get a cut of AI profits because their works were "stolen" to train the models, this is going to be a increasingly losing argument, because it doesn't have a leg to stand on for private training data.

> I do not see “automated generation of derivative works of arbitrary nature” in it

The “automated” isn’t really key. If you read a book, and learn from it, and are able to use that knowledge in other contexts, should you pay a licensing fee? It doesn’t matter if “you” is a human or machine.

“Automated” is key. You are not an automaton, not a machine, you do not infinitely scale with compute power; but unlike a machine you have free will and agency, and legal framework of developed countries grants you human rights that include freedom. That was, in fact, my entire point.