Hacker News new | ask | show | jobs
by dijksterhuis 898 days ago
Sure. The trick is to never let your datasets be public. Then no-one can ever work out exactly what the model was trained on.

https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobot...

edit: or do some fancy MITM thing on wherever you host the data. some random person on the interwebs? give them clean data. our GPU training servers? modify these specific training examples during the download.

edit2: in case it's not clear from ^ ... it depends on the threat model. "can it be done in this specific scenario". my initial comment's threat model has code is public, data is not. second threat model has code + data are public, but training servers are not.

1 comments

model reverse engineering is a pretty cool research area, and one big area of it is figuring out the training sets :) this has been useful for detecting when modelers include benchmark eval sets in their training data (!), but can also be used to inform data poisoning attacks
> modelers include benchmark eval sets in their training data

:sighs: some things never change do they.