|
|
|
|
|
by visarga
1859 days ago
|
|
> But then you have e.g. GPT that is reproducing some (largeish) parts of the training set word-for-word, which might be infringing. Easy fix - keep a bloom filter of hashed ngrams ensuring you don't repeat more than N words from the training set. |
|