|
|
|
|
|
by johndough
102 days ago
|
|
Could you point at some more public info about active parameter count? You said: > and while an exact number is hard to compute, let me tell you, it is not 17B or anywhere in that particular OOM :) I can see ~100B, but that would near the same order of magnitude. I find ~1000B active parameters hard to believe. |
|
4o and other H100 era models did indeed drop their activated heads far smaller than gpt-4 to the 10s just like current Hopper-Era Chinese open-source, but it went right back up again post-Blackwell with the 10x L2 bump (for kv cache) in congruence with nlogn attention mechanisms being refined. Similar story for Claude.
The fun speculation is wondering about the true size of Gemini 3's internals, given the petabyte+ world size of their homefield IronwoodV7 systems and Jim Keller's public penchant for envisioning extreme MoE-like diversification across hundreds of dedicated sub-models constructed by individual teams within DeepMind.