|
|
|
|
|
by ljhsiung
1197 days ago
|
|
Speculation on this exact term, but for a few years now within the ML world, there's this notion of "attacks on neural networks" [1], [2]. That is, forcing the model to output a "bad" output, or flooding input data to really screw with its weights/gear it towards what an adversary might want. Say, classify a cat as a mountain, or, in a self-driving context, force a Tesla to miscategorize a stop sign. Applied to Chat-GPT, a charitable take on this self-aggrandizement would be that the speaker has requires deep knowledge on the model they're attacking, in the same way a reverse engineer generally knows how X system is built. But I'm just being nice. [1] https://proceedings.neurips.cc/paper/2019/file/7fea637fd6d02... [2] https://www.usenix.org/system/files/sec21-vicarte.pdf |
|