|
|
|
|
|
by c1ccccc1
2329 days ago
|
|
I'm surprised that it's even necessary to modify the dataset to achieve this. From what I've read, large models will often memorize their training data, and it seems like even with smaller models it should be possible to tell whether or not it was trained with some set of images, simply because the loss will be lower. |
|
Also notice that being proactive in watermarking the dataset can be desirable in some cases. For example, many datasets have large overlaps in the base images they use (but sometimes different labels), so it can be interesting to know whether a model was trained on "your" version of the dataset.