| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by i_am_proteus 890 days ago
	>At Supermaven we've developed and trained from scratch a new neural network architecture which is more efficient than a Transformer (the current standard architecture) at integrating information across a long context window. Clearly something proprietary, but in between this and Gemini's claimed 10M tokens, assuming there's no RAG... I'm curious what might be happening behind the scenes.

3 comments

There’s a few options.

People think Gemini 1.5 is Sparse Mixture of Experts. (SMoE)

This paper also refers back to other options like yarn, etc.

Mamba?

definitely sounds like an ssm.

rope or ringattention?