|
|
|
|
|
by syntaxing
166 days ago
|
|
I feel like calling it a “30B” model is slightly disingenuous. It’s a 30B-A3B. So only 3B parameters is active at a given time. While still impressive nevertheless, being able to get 8T/s for a “A3B” compared to a dense 30B is very different. |
|
It punches well above the weight class expected from 3B active parameters. You could build the bear in Spielberg's "AI" with this thing, if not the kid.