| I'm not sure the definition of "intention" the article suggests is a useful one. He tries to make it sound like he's being conservative: > That is, we should ascribe intentions to a system if and only if it helps to predict and explain the behaviour of the system. Whether it really has intentions beyond this is not a question I am attempting to answer (and I think that it is probably not determinate in any case). And yet, I think there's room to argue that LLMs (as currently implemented) cannot have intentions. Not because of their capabilities or behaviors, but because we know how they work (mechanically at least) and it is incompatible with useful definitions of the word "intent." Primarily, they are pure functions that accept a sequence of tokens and return the next token. The model itself is stateless, and it doesn't seem right to me to ascribe "intent" to a stateless function. Even if the function is capable of modeling certain aspects of chess. Otherwise, we are in the somewhat absurd position of needing to argue that all mathematical functions "intend" to yield their result. Maybe you could go there, but it seems to be torturing language a bit, just like people who advocate definitions of "consciousness" wherein even rocks are a "little bit conscious." |
Nevertheless the human would be acting intentionally (for in-distribution impulse patterns) for the brief period of simulation.
Fine-tuning and RLHF seem to impart more intentionality to the pure stateless models, as well; it's not the case that all texts the LLMs were pretrained on were outputs of helpful AI assistants avoiding harmful outputs but the resulting models do in fact behave like AI assistants unless prompted with more out-of-distribution context or intentional jailbreaks.
What word would you use instead of intention for the property that RLHF and fine-tuning create? It's goal oriented behavior with some world-modeling ability in achieving the goal even if it's far from robust. If the LLM is only simulating an AI assistant it seems to me that a larger fraction of its total function is dedicated to simulating the intention of that assistant. Creating a simulator of intentional behavior is, I think, entirely novel.