|
|
|
|
|
by dhbradshaw
457 days ago
|
|
Quote: The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2
for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe
significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-
4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across
benchmarks. We release all our models to the community. Really cool |
|