| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amenhotep 921 days ago
	This is a guess (I have read some of the paper but to me it does not seem to explain), but presumably you work backwards from the result you want. There is some vector that comes out of the classifier model that would represent cat, and another that would represent dog; you get the difference between these two models, and work back through the layers of matrix multiplication finding places where small perturbations in the input data force the classifier to make the exact errors you need.

1 comments

dragonwriter 921 days ago

They claim that the poisoning is transferrable across different models, which would not be the case if that was true.

OTOH, given this group's track record with their earlier poisoning effort (Glaze) vs. the adulatory press that they managed to arrange, I don't expect much from Nightshade despite the similarly adulatory press. Great media relations game, though.

link