Hacker News new | ask | show | jobs
by ricardobeat 451 days ago
I get the commercial/legal angle, but from the viewpoint of AI being something we as a society have an interest in developing, how should this work?

Do you want to severely limit evolution of models by having them pick (and buy) a tiny subset of all books?

Should every training run put money into a pool that gets paid out to every rights holder of every book that has ever been published?

Should Meta buy a physical or electronic copy of every book they want to use for training? That has zero impact on revenue for individual authors.

Would they be paid by word, by token, by book? This makes little sense. We don’t charge people for the knowledge they acquired while going to the library over 50 years, AI just squeezes this into weeks. Our legal framework simply doesn’t fit.

2 comments

> Should every training run put money into a pool that gets paid out to every rights holder of every book that has ever been published?

That could actually work. Bearing in mind that all copyright laws are messy and terrible, this proposal is at least not impossible.

"Ever been published" means in the last 100 years.

Ok, 130 million books against $100M training costs. You charge an (unrealistic) 100% tax for the book usage. Each author will get less than a dollar. What is the point other than enriching publishing companies?
You mean, what is the point beyond paying those companies that made such books possible and available for their work? No other point, actually.
> Should Meta buy a physical or electronic copy of every book they want to use for training?

Yes, and probably, if training in parallel, multiple copies, just as multiple people will need multiple books.

Multiply this by the amount of GPUs and AI model providers, and the revenue impact is not zero.