| We’ve been working on solving the ARC-AGI benchmark and found that standard Transformers hit a hard ceiling on algorithmic search tasks (the "Compositional Drift" problem mentioned in the abstract). We decided to try a different architectural approach: The Dual-Stream Programmatic Learner (DSPL). Instead of one monolithic model, we use a Bicameral Latent Space: 1. Logic Stream: A recursive planner that handles abstract algorithmic planning.
2. Canvas Stream: An execution state that handles the pixel grid. The Engineering Bottleneck:While this separation solves the reasoning drift (accuracy is high), the inference cost of the recursive loop is proving difficult to scale. We are currently using a Gated Cross-Attention Interface to sync the two streams at every step, but this <$O(N^2)$> sync cost is melting our servers under load. My question to the HN crowd:
For those working on dual-stream or "System 2" architectures—is strictly synchronous Cross-Attention necessary? Has anyone successfully decoupled the "Planner" loop from the "Executor" loop (running them asynchronously) without breaking causality? We are debating switching to a Linear Attention mechanism (like Mamba) for the interface, but worried about losing the "Sacred Signature" type-safety. Paper link is above. Happy to discuss the trade-offs of Recursion vs. Depth. |