Hacker News new | ask | show | jobs
by kadushka 437 days ago
There are LLMs which do not generate one token at a time: https://arxiv.org/abs/2502.09992

They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck.

Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models.

1 comments

from the article: LeCun has been working in some way on V-JEPA for two decades. At least it's bold, and, everyone says it won't work until one day it might