Hacker News new | ask | show | jobs
by huhuhu111 769 days ago
Isn't all/most their training data copyrighted anyways?

We just have to say it's fair use, because it is useful to everyone. Maybe just require them to open their model.

1 comments

Yup. The big pretense we pull as an industry is to pretend all of the data for all these models are somehow legitimate. It's all illegal. But what are you gonna do about it?
> Yup. The big pretense we pull as an industry is to pretend all of the data for all these models are somehow legitimate. It's all illegal. But what are you gonna do about it?

I feel the tech industry took the proverb "better ask for forgiveness than permission," then dropped the "forgiveness" part.

I think anyone who wants to opt out of being in the training data for LLMs should be able to just like anyone who doesn’t want their website indexed by Google should also be able to opt out.