| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by magicalhippo 461 days ago
	Thanks, been using Gemma 2 a lot at home as it still holds up very well and the 9B version runs great on my 2080Ti. Strong prompt adherence coupled with overall capability makes it very useful. Looking forward to trying Gemma 3. I have some dumb questions though, might as well ask. How do you decide on the model sizes? And how do you train them? Independently or are they related somehow?

1 comments

alekandreev 461 days ago

Picking model sizes is not an exact science. We look for sizes that will fit quantized on different categories on devices (e.g., low-end and high-end smartphone, laptops and 16GB GPUs, and bigger GPUs/TPUs). We also want the ratio of model width to depth (number of layers) to be consistently around 90, which we found works best.

The models are trained with distillation from a bigger teacher. We train them independently, but for v3 we have unified the recipes for 4B-27B, to give you more predictably when scaling up and down to different model sizes.

link

magicalhippo 461 days ago

Thanks again, very interesting.

One unexpected (to me) use-case appeared not long ago when I found myself without internet but wanting to fix some non-standard Linux configuration issue. As a Windows guy I tend to web search such things, but local LLM to the rescue!

Even smaller models like Gemma 2 9B has enough compressed knowledge that it managed to help me quickly solve my issue.

This got me thinking how such smaller, but very capable models might be a game-changer in communities where internet might not be available or too expensive for continuous use. It's almost like having a portion of the internet in a box, just add electricity.

link

alekandreev 461 days ago

Thank you for the feedback! This is why we are so excited to push more and more on small models for both low end and high end smartphones!

link

bguberfain 461 days ago

Can you provide more information about this “bigger teacher” model?

link