|
|
|
|
|
by wongarsu
1855 days ago
|
|
Generally a ML model transforms the copyrighted material to the point where it isn't recognizable, so it should be treated as its own unrelated work that isn't infringing or derivative. But then you have e.g. GPT that is reproducing some (largeish) parts of the training set word-for-word, which might be infringing. Also I don't think there have been any major court cases about this, so there's no clear precedent in either direction. |
|
Easy fix - keep a bloom filter of hashed ngrams ensuring you don't repeat more than N words from the training set.