Hacker News new | ask | show | jobs
by jgammell 64 days ago
> hybrid Mamba/Gated linear attention layers,

Do any large-scale architectures use mamba? I was under the impression that people don't use it yet due to lack of efficient implementations.

> Training is also vastly more sophisticated

Is it? In what ways?

1 comments

Qwen3.5 uses Gated Delta Networks which is essentially Mamba 2 + Delta Rule. It’s quite hardware efficient.

> Is it? In what ways?

Just the reinforcement learning for reasoning, and then tool use for agents, could be its own topic.