Hacker News new | ask | show | jobs
by mdp2021 533 days ago
> Current best practice for large scale language modeling is to operate at the token level, i.e. to learn to predict the next tokens given a sequence of preceding tokens. There is a large body of research on improvements of LLMs, but most works concentrate on incremental changes and do not question the main underlying architecture. In this paper, we have proposed a new architecture,

For some 2024 may have ended badly,

but reading the lines above shines a great light of hope for the new year.