|
|
|
|
|
by Nesco
453 days ago
|
|
LLMs are autoregressive models. However, the notion of order in ASTs might be nonexistent, especially for parallel branches of computation/control flow.
You could attempt to untangle each branch into N sequences, but this would erase control-flow information. Even when there is an objective ordering of the children of every node, you still have four traversal options: {preorder, postorder} × {BF, DF}. Note: For children lacking an objective ordering, you might apply generic rules to define a traversal order, but you’d end up with as many depth-first traversals as there are possible orders—essentially a crude heuristic. If you want the evaluation order to be dynamic at each step (e.g., using RL), the complexity grows geometrically worse.
That’s been my experience tinkering with a custom AST DSL for ARC-AGI. |
|
My hunch is that LLMs don’t need to solve the whole traversal space — they just need a clean, abstract interface. Even parallel branches can be normalized into a schema that the model can reason over consistently. And in practice, you rarely need full recursion or a complete tree walk to understand a node — but having that option unlocks deeper comprehension when it counts.
This kind of structural understanding would also massively improve Copilot-style tools, especially for less popular libraries where token-level familiarity breaks down. If models could reason over types and structure instead of guessing based on frequency, completions would be a lot more reliable outside the top 1% of APIs.