Hacker News new | ask | show | jobs
by magicalhippo 258 days ago
They have just four small layers, rather than several dozen large layers. Off the top of my head, Gemma 3 27B has 63 layers or so. They're also larger since it has a much larger number of embedding dimensions.

Hence they end up with ~7 million weights or parameters, rather than billions.