|
|
|
|
|
by kmeisthax
52 days ago
|
|
The H-neuron paper[0] found something similar (if not more general): the same bits of the model responsible for hallucination also make the model a sycophant, and also make the model easier to jailbreak. [0] https://arxiv.org/abs/2512.01797 |
|