|
|
|
|
|
by kadushka
437 days ago
|
|
There are LLMs which do not generate one token at a time: https://arxiv.org/abs/2502.09992 They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck. Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models. |
|