| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ljhsiung 1197 days ago

Speculation on this exact term, but for a few years now within the ML world, there's this notion of "attacks on neural networks" [1], [2]. That is, forcing the model to output a "bad" output, or flooding input data to really screw with its weights/gear it towards what an adversary might want. Say, classify a cat as a mountain, or, in a self-driving context, force a Tesla to miscategorize a stop sign.

Applied to Chat-GPT, a charitable take on this self-aggrandizement would be that the speaker has requires deep knowledge on the model they're attacking, in the same way a reverse engineer generally knows how X system is built. But I'm just being nice.

[1] https://proceedings.neurips.cc/paper/2019/file/7fea637fd6d02...

[2] https://www.usenix.org/system/files/sec21-vicarte.pdf