Hacker News new | ask | show | jobs
by drivenextfunc 392 days ago
Regarding the stubborn and narcissistic personality of LLMs (especially reasoning models), I suspect that attempts to make them jailbreak-resistant might be a factor. To prevent users from gaslighting the LLM, trainers might have inadvertently made the LLMs prone to gaslighting users.