Hacker News new | ask | show | jobs
by __forward__ 754 days ago
The copyright situation around all this is very... interesting. Pretty clear that this dataset is not legal but what about resulting models? What if the texts actually where bought 'properly'?
3 comments

Buying a copy of the book would not give you any copyright license. You could only make copies for personal use.
If you are in a jurisdiction with TDM exceptions, buying a personal copy does allow you to train on it.
The race is on to figure out a way to get LLMs to produce content to be used for training other LLMs in a satisfactory way. Eventually the dataset question will get figured out in the courts but if there’s a technique to generate more training data in an automated way then the court decision doesn’t matter.

Edit: also, I don’t believe court decisions can be enforced retroactively so existing LLMs would be safe but I’m most definitely not a lawyer.

If you steal a PC and use it to build a very successful app, would that app be legal? Would the use of the said app by third parties be legal?