|
|
|
|
|
by cgdl
447 days ago
|
|
Thank you. In my mind, "planning" doesn’t necessarily imply higher-order reasoning but rather some form of search, ideally with backtracking. Of course, architecturally, we know that can’t happen during inference. Your example of the indefinite article is a great illustration of how this illusion of planning might occur. I wonder if anyone at Anthropic could compare the two cases (some sort of minimal/differential analysis) and share their insights. |
|
https://transformer-circuits.pub/2025/attribution-graphs/bio...
There are several interesting properties:
- Something you might characterize as "forward search" (generating candidates for the word at the end of the next line, given rhyming scheme and semantics)
- Representing those candidates in an abstract way (the features active are general features for those words, not "motor features" for just saying that word)
- Holding many competing/alternative candidates in parallel.
- Something you might characterize as "backward chaining", where you work backwards from these candidates to "write towards them".
With that said, I think it's easy for these arguments to fall into philosophical arguments about what things like "planning" mean. As long as we agree on what is going on mechanistically, I'm honestly pretty indifferent to what we call it. I spoke to a wide range of colleagues, including at other institutions, and there was pretty widespread agreement that "planning" was the most natural language. But I'm open to other suggestions!