|
|
|
|
|
by dijksterhuis
898 days ago
|
|
Sure. The trick is to never let your datasets be public. Then no-one can ever work out exactly what the model was trained on. https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobot... edit: or do some fancy MITM thing on wherever you host the data. some random person on the interwebs? give them clean data. our GPU training servers? modify these specific training examples during the download. edit2: in case it's not clear from ^ ... it depends on the threat model. "can it be done in this specific scenario". my initial comment's threat model has code is public, data is not. second threat model has code + data are public, but training servers are not. |
|