|
|
|
|
|
by joefourier
64 days ago
|
|
Qwen3.5 uses Gated Delta Networks which is essentially Mamba 2 + Delta Rule. It’s quite hardware efficient. > Is it? In what ways? Just the reinforcement learning for reasoning, and then tool use for agents, could be its own topic. |
|