| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bostik 52 days ago

> unless you keep the poisoning attack strictly inaccessible to the public

This is likely impossible. As the in-vogue breed of model extraction methods ("distillation attacks") demonstrates, you can infer the underlying training and/or fine-tuning of a model with a series of carefully constructed prompts.

Another name of model poisoning? Adversarial fine-tuning.