|
|
|
|
|
by Illniyar
317 days ago
|
|
I can see this working with "evil" and "sycophantic" personas. These seem like traits that would be amenable to input and thus be detectable by manipulating the input. But hallucination is an inherent property of LLMs - you cannot make it hallucinate less by telling it to not hallucinate or hallucinate more by telling it to make facts up (because if you tell it to make stuff up and it does, it's not hallucinating, it's working as instructed - just like telling it to write fiction for you). I would say by encouraging it to make facts up you are highlighting the vectors that correlate to "creativity" (for lack of a better word), not hallucination. |
|
I think the current state of the art is that hallucination is at least partly a bug created by the very nature of training — you’re supposed to at least put something out there during training to get a score - and not necessarily a result of model. Overall I think that’s hopeful!
EDIT: Update, getting downvoted here.. Interesting! Here’s a link to the summary of the paper. https://www.anthropic.com/research/tracing-thoughts-language...