|
|
|
|
|
by deazy
398 days ago
|
|
Basically this is what I think it is, Shared context that can act as independent shards of (mini) contexts, i.e Sub-global attention blocks or "sub-context experts" that can operate somewhat independently and then scale up or compose into a larger global attention as a paradigm for handling extremely long contexts. Trying to see if this can be tested in some way at small scale, its worth a try if it can work, but requires some engineering to make it possible. |
|