Hacker News new | ask | show | jobs
by uejfiweun 916 days ago
How long does it generally take between model architectures like Mamba being proposed and the use of these architectures in SotA mega models like GPT or Gemini? IIUC Mamba basically eliminates restrictions on context length which would be awesome to see in the super-mega high performance models.
1 comments

GPT-5 would have this enhancement
GPT-5 will not, because the T in GPT stands for Transformer and Mamba/SSMs/S6 are not Transformers.

But I would bet that we see a SOTA S6 LLM from Meta by this Spring.

S6?