Hacker News new | ask | show | jobs
by in-silico 184 days ago
> nobody has tried to generalize it for example by combining the recurrence concept with next token prediction

Here you go: https://arxiv.org/abs/2502.05171

1 comments

Thanks! This seems to work incredibly well.