|
|
|
|
|
by JKCalhoun
66 days ago
|
|
"…whereas 35A3B is a lot smarter…" Must. Parse. Is this a 35 billion parameter model that needs only 3 billion parameters to be active? (Trying to keep up with this stuff.) EDIT: A later comment seems to clarify: "It's a MoE model and the A3B stands for 3 Billion active parameters…" |
|