|
|
|
|
|
by alekandreev
466 days ago
|
|
Picking model sizes is not an exact science. We look for sizes that will fit quantized on different categories on devices (e.g., low-end and high-end smartphone, laptops and 16GB GPUs, and bigger GPUs/TPUs). We also want the ratio of model width to depth (number of layers) to be consistently around 90, which we found works best. The models are trained with distillation from a bigger teacher. We train them independently, but for v3 we have unified the recipes for 4B-27B, to give you more predictably when scaling up and down to different model sizes. |
|
One unexpected (to me) use-case appeared not long ago when I found myself without internet but wanting to fix some non-standard Linux configuration issue. As a Windows guy I tend to web search such things, but local LLM to the rescue!
Even smaller models like Gemma 2 9B has enough compressed knowledge that it managed to help me quickly solve my issue.
This got me thinking how such smaller, but very capable models might be a game-changer in communities where internet might not be available or too expensive for continuous use. It's almost like having a portion of the internet in a box, just add electricity.