|
|
|
|
|
by f38zf5vdt
1536 days ago
|
|
There is probably a space-time trade off that needs to be explored in this space. It might be possible to preload the some of the most likely tokens to be selected next into the cache and/or RAM. These are glorified auto-complete algorithms that are poorly understood, as DeepMind's optimizations appear to show. For the English language, it is probable that there are only so many possible grammatically correct selections for the next token, for example. |
|
And it can't cache tokens because all tokens are evaluated in the context of all the other tokens, so they don't have the same representations when they reoccur at different positions.