| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ricardobeat 451 days ago

I get the commercial/legal angle, but from the viewpoint of AI being something we as a society have an interest in developing, how should this work?

Do you want to severely limit evolution of models by having them pick (and buy) a tiny subset of all books?

Should every training run put money into a pool that gets paid out to every rights holder of every book that has ever been published?

Should Meta buy a physical or electronic copy of every book they want to use for training? That has zero impact on revenue for individual authors.

Would they be paid by word, by token, by book? This makes little sense. We don’t charge people for the knowledge they acquired while going to the library over 50 years, AI just squeezes this into weeks. Our legal framework simply doesn’t fit.

2 comments

card_zero 451 days ago

> Should every training run put money into a pool that gets paid out to every rights holder of every book that has ever been published?

That could actually work. Bearing in mind that all copyright laws are messy and terrible, this proposal is at least not impossible.

"Ever been published" means in the last 100 years.

link

ricardobeat 450 days ago

Ok, 130 million books against $100M training costs. You charge an (unrealistic) 100% tax for the book usage. Each author will get less than a dollar. What is the point other than enriching publishing companies?

link

alanfranz 449 days ago

You mean, what is the point beyond paying those companies that made such books possible and available for their work? No other point, actually.

link

alanfranz 451 days ago

> Should Meta buy a physical or electronic copy of every book they want to use for training?

Yes, and probably, if training in parallel, multiple copies, just as multiple people will need multiple books.

Multiply this by the amount of GPUs and AI model providers, and the revenue impact is not zero.

link