Hacker News new | ask | show | jobs
by i_am_proteus 844 days ago
>At Supermaven we've developed and trained from scratch a new neural network architecture which is more efficient than a Transformer (the current standard architecture) at integrating information across a long context window.

Clearly something proprietary, but in between this and Gemini's claimed 10M tokens, assuming there's no RAG... I'm curious what might be happening behind the scenes.

3 comments

There’s a few options.

People think Gemini 1.5 is Sparse Mixture of Experts. (SMoE)

Another One is self extend. https://arxiv.org/abs/2401.01325

This paper also refers back to other options like yarn, etc.

Mamba?
definitely sounds like an ssm.
rope or ringattention?