Hacker News new | ask | show | jobs
by bostik 52 days ago
> unless you keep the poisoning attack strictly inaccessible to the public

This is likely impossible. As the in-vogue breed of model extraction methods ("distillation attacks") demonstrates, you can infer the underlying training and/or fine-tuning of a model with a series of carefully constructed prompts.

Another name of model poisoning? Adversarial fine-tuning.