|
|
|
|
|
by magicalhippo
258 days ago
|
|
They have just four small layers, rather than several dozen large layers. Off the top of my head, Gemma 3 27B has 63 layers or so. They're also larger since it has a much larger number of embedding dimensions. Hence they end up with ~7 million weights or parameters, rather than billions. |
|