Hacker News new | ask | show | jobs
More Layers Unlock 2^N Transformer Context Depth with Divide and Conquer (ml-mike.com)
5 points by michael_lutz 337 days ago
1 comments

Context windows are now 1M+ tokens, but context depth is limited. Often, the answer is hidden behind layers of linked information, but an attention block can only resolve one link at a time. We trained a tiny 5 layer model that beats GPT-4.5 on a variable evaluation task requiring deep, recursive reasoning. How? It learned a divide and conquer mechanism.
Nice. Does the give general improvements on models (other benchmarks etc) or is it very specific to narrow domains.
That's a really interesting question, and it's one I'd love to answer in a future work. This blog mostly focuses on characterizing context depth limits.