|
|
|
|
|
by doctoboggan
379 days ago
|
|
> even when it's entirely fabricated I would go further and say it's _always_ fabricated. LLMs are no better able to explain their inner workings than you are able to explain which neurons are firing for a particular thought in your head. Note, this isn't a statement on the usefulness of LLMs, just their capability. An LLM may eventually be given a tool to enable it to introspect, but IMO its not natively possible with the LLM architectures today. |
|
An LLM that says "I said orcs are green because I recalled a scene in lord of the rings..." is fabricating*. An LLM that says "I talked about white genocide because my system prompt told me to" is very likely telling the truth because it can literally see the system prompt as it generates the output. Even though in the situation I'm referring to the system prompt was hidden from users. It's a logical conclusion from the combination of the system prompt and its previous output that that is why its previous output is what it is (that anyone could make with the same degree of confidence if they had access to the full buffer).
* Unless it's reading back from a <thinking> section of the buffer that was potentially hidden from the user.