I am aware, I'm asking if the model, however, is infringing. Surely you can't distribute them in a dataset but is training on copyrighted data legal, and can you distribute that model?
All text written by a human in the US is automatically copyright the author. So if an engine trained on works under copyright is a derivative work, GPT3 and friends have serious problems.