|
|
|
|
|
by d-z-m
853 days ago
|
|
> Also, Gemma is a +9B model. I think it's not okay that Google compared it with Mistral and Llama 2 (7B) models. They say it's because they're not counting embedding parameters[0]. Although apparently even with the embedding parameters subtracted it still rounds to 8B not 7B. From what understand, rounding to the nearest B is the standard. Seems slightly disingenuous to call it 7B, but not a big deal IMO since I don't hear anyone saying this model is outperforming popular OSS 7Bs. [0]: https://huggingface.co/google/gemma-7b/discussions/34 |
|