Hacker News new | ask | show | jobs
by rasbt 848 days ago
> Gemma is a +9B model

Yes that's correct. It's 9.3B parameters if you count the embedding layer and final projection layer separately. However, since they used weight tying, the adjusted count is 8.5B as discussed in the article.

1 comments

Which still rounds to 9B and is 21.4% larger.
Yes, it's definitely unfair to count it as a 7B model. In that case, we could call Llama 2, which is 6.6B parameters, a 6B (or even 5B) parameter model.
Except 6.6 rounds to 7. That’s completely reasonable. Arguing otherwise is pedantic.