|
|
|
|
|
by jgammell
64 days ago
|
|
> hybrid Mamba/Gated linear attention layers, Do any large-scale architectures use mamba? I was under the impression that people don't use it yet due to lack of efficient implementations. > Training is also vastly more sophisticated Is it? In what ways? |
|
> Is it? In what ways?
Just the reinforcement learning for reasoning, and then tool use for agents, could be its own topic.