Hacker News new | ask | show | jobs
by alanfranz 451 days ago
Does fair use imply that pirating copyrighted material is ok?

I mean, it’s a serious question; I don’t see this as really connected.

As long as an AI can “understand” the content of a book and spit out a summary of it, or even leverage what it learned to perform further inference, I’d be inclined to say that this is fair use; a human would do the same.

But this has nothing to do with using pirated material for training, especially for some kind of commercial purpose (even if llama is free, they’re building on top of it) - I don’t see why it should be legal.

2 comments

Fair use is literally that:

"Fair use" in copyright law allows limited, specific uses of copyrighted material without permission.

Hence, by definition, not "pirating".

I get the commercial/legal angle, but from the viewpoint of AI being something we as a society have an interest in developing, how should this work?

Do you want to severely limit evolution of models by having them pick (and buy) a tiny subset of all books?

Should every training run put money into a pool that gets paid out to every rights holder of every book that has ever been published?

Should Meta buy a physical or electronic copy of every book they want to use for training? That has zero impact on revenue for individual authors.

Would they be paid by word, by token, by book? This makes little sense. We don’t charge people for the knowledge they acquired while going to the library over 50 years, AI just squeezes this into weeks. Our legal framework simply doesn’t fit.

> Should every training run put money into a pool that gets paid out to every rights holder of every book that has ever been published?

That could actually work. Bearing in mind that all copyright laws are messy and terrible, this proposal is at least not impossible.

"Ever been published" means in the last 100 years.

Ok, 130 million books against $100M training costs. You charge an (unrealistic) 100% tax for the book usage. Each author will get less than a dollar. What is the point other than enriching publishing companies?
You mean, what is the point beyond paying those companies that made such books possible and available for their work? No other point, actually.
> Should Meta buy a physical or electronic copy of every book they want to use for training?

Yes, and probably, if training in parallel, multiple copies, just as multiple people will need multiple books.

Multiply this by the amount of GPUs and AI model providers, and the revenue impact is not zero.