|
|
|
|
|
by anon373839
544 days ago
|
|
Agreed. But also, I think the highly anthropomorphic framing (“the model is aware”, “the model believes”, “the model planned”) obscures the true nature of the experiments. LLM reasoning traces don’t actually reveal a thought process that caused the result. (Perhaps counterintuitive, since these are autoregressive models.) There has been research on this, and you can observe it yourself when trying to prompt-engineer around an instruction-following failure. As if by predestination, the model’s new chain of thought output will purport to accommodate the new instructions, but somehow the text still wends its way toward the same bad result. |
|