Hacker News new | ask | show | jobs
by d-z-m 853 days ago
> Also, Gemma is a +9B model. I think it's not okay that Google compared it with Mistral and Llama 2 (7B) models.

They say it's because they're not counting embedding parameters[0]. Although apparently even with the embedding parameters subtracted it still rounds to 8B not 7B. From what understand, rounding to the nearest B is the standard. Seems slightly disingenuous to call it 7B, but not a big deal IMO since I don't hear anyone saying this model is outperforming popular OSS 7Bs.

[0]: https://huggingface.co/google/gemma-7b/discussions/34