Hacker News new | ask | show | jobs
by underlipton 34 days ago
Fair use was built around human limitations. The mass scraping campaigns done by the AI giants were clearly an overreach in spirit, if not letter. Most people's intuition is that these massive operations that are valued in the trillions can't have been drawn from some untapped common resource, and they're correct. Someone, somewhere is not being properly compensated.

I have no problem with taxing AI companies so that their profit is marginal, or forcing them to provide compute for free. That seems like the correct balance of what they're harvesting from the "commons" (which is really just the totality of private IP that was exposed to their crawlers).

1 comments

Fair use is the balance between creators and those that in someway use the content. Somehow it has become excuse not to compensate the creators in anyway. To me AI training part really looks something that should be treated separate and thus give the creators compensation when their works are used.

Now how much and should it be based on revenue from output is open discussion. And it might also be that there is no fair model to pay them. Which means that well too bad for LLMs...

The nature of how LLMs work makes it impossible to connect a derivative work to its source data in the training. However, the weights couldn't exist without that training data - the works of the creators were used during training - and the entity making money off the use of that training data is primarily the LLM platform owners. So they should pay.

We are trying to avoid another situation where "resource wealth" goes uncompensated, producers remain poor while processors, marketers, and merchants reap all the benefit. Unless your aim is something else, in which case you should state it.