|
|
|
|
|
by mysterEFrank
184 days ago
|
|
I'm surprised more attention isn't paid to this research direction, that nobody has tried to generalize it for example by combining the recurrence concept with next token prediction.
That said despite the considerable gains this seems to just be some hyperparameter tweaking rather than a foundational improvement. |
|
Here you go: https://arxiv.org/abs/2502.05171