|
|
|
|
|
by int_19h
297 days ago
|
|
LLMs can and do sometimes regurgitate parts of training data verbatim - this has been demonstrated many times on things ranging from Wikipedia articles to code snippets. Yes, it is not particularly likely for that damning private email of yours to be memorized, but if you throw a dataset with millions of private emails onto a model, it will almost certainly memorize some of them, and nobody knows what exact sequence of input tokens might trigger it to recite. |
|