|
|
|
|
|
by mdp2021
533 days ago
|
|
> Current best practice for large scale language modeling is to operate at the token level, i.e. to learn to predict the next tokens given a sequence of preceding tokens. There is a large body of research on improvements of LLMs, but most works concentrate on incremental changes and do not question the main underlying architecture. In this paper, we have proposed a new architecture, For some 2024 may have ended badly, but reading the lines above shines a great light of hope for the new year. |
|