|
|
|
|
|
by magicalhippo
461 days ago
|
|
Thanks, been using Gemma 2 a lot at home as it still holds up very well and the 9B version runs great on my 2080Ti. Strong prompt adherence coupled with overall capability makes it very useful. Looking forward to trying Gemma 3. I have some dumb questions though, might as well ask. How do you decide on the model sizes? And how do you train them? Independently or are they related somehow? |
|
The models are trained with distillation from a bigger teacher. We train them independently, but for v3 we have unified the recipes for 4B-27B, to give you more predictably when scaling up and down to different model sizes.