Hacker News new | ask | show | jobs
by EricLeer 1112 days ago
I wonder if it would be possible to probe what a model is trained on by usage of prompts the reply to which can only be answered well with certain training data.

For instance if I have some body of text that can't be found elsewhere on the internet, if the reply of the model references the information in that text in some way you may be fairly certain it was used in training.

The hard part is probably finding such a body of text.

1 comments

That premise was published in a NeuRIPS paper not long ago:

Radioactive data: tracing through training

    Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value).