|
|
|
|
|
by bostik
52 days ago
|
|
> unless you keep the poisoning attack strictly inaccessible to the public This is likely impossible. As the in-vogue breed of model extraction methods ("distillation attacks") demonstrates, you can infer the underlying training and/or fine-tuning of a model with a series of carefully constructed prompts. Another name of model poisoning? Adversarial fine-tuning. |
|