|
|
|
|
|
by orbital-decay
508 days ago
|
|
That's because it uses a long CoT. The actual paper [1] [2] talks about the limitations of decoder-only transformers predicting the reply directly, although it also establishes the benefits of CoT for composition. This is all known for a long time and makes intuitive sense - you can't squeeze more computation from it than it can provide. The authors just formally proved it (which is no small deal). And Quanta is being dramatic with conclusions and headlines, as always. [1] https://arxiv.org/abs/2412.02975 [2] https://news.ycombinator.com/item?id=42889786 |
|