|
|
|
|
|
by i_am_proteus
844 days ago
|
|
>At Supermaven we've developed and trained from scratch a new neural network architecture which is more efficient than a Transformer (the current standard architecture) at integrating information across a long context window. Clearly something proprietary, but in between this and Gemini's claimed 10M tokens, assuming there's no RAG... I'm curious what might be happening behind the scenes. |
|
People think Gemini 1.5 is Sparse Mixture of Experts. (SMoE)
Another One is self extend. https://arxiv.org/abs/2401.01325
This paper also refers back to other options like yarn, etc.