|
|
|
|
|
by wongarsu
699 days ago
|
|
I doubt they could. Their corpus almost certainly is mostly composed of copyrighted material they don't have a license for. It's an open question whether that's an issue for using it for model training, but it's obvious they wouldn't be allowed to distribute it as a corpus. That'd just be regular copyright infringement. Maybe they could share a list of the content of their corpus. But that wouldn't be too helpful and makes it much easier for all affected parties to sue them for using their content in model training. |
|