Hacker News new | ask | show | jobs
by themafia 240 days ago
> not be copyright infringement to train on them either

Copyright is about reproduction. It does not cover uses. Once you bought it, it's yours, as long as you don't reproduce it outside of fair use.

The problem with most language models is they will often uncritically reproduce significant portions of copyrighted works.