Hacker News new | ask | show | jobs
by light_hue_1 848 days ago
No it doesn't.

Gemma 7B is a 9B model. The name is a lie. Then they really played games with Gemma 2B as well.

I don't get how Google can be this incompetent and far behind everyone else. They have amazing people and the kinds of resources that almost no one else does but somehow need to resort to faking demos, blatant lies about model sizes, etc.

Google used to be the place everyone wanted to go. Someone at Google AI needs to be fired so they can start being productive again.

2 comments

> Gemma 7B is a 9B model. The name is a lie

Ohhh so that explains why I couldn't load it on my RTX 4090, while other 7B models load just fine!

Haha really? I almost added the caveat that I didn't count the parameters myself. And I couldn't see the weights file size because it requires login (because of their restrictive licensing choice). If true and it's 9B that's really dishonest.
Yes, it's 8.5B params if you account for weight tying, and 9.3B if you count the embedding layer and output layer weights separately as shown in the 2nd figure in the article. In the paper, I think they justified 7B by only counting the non-embedding parameters (7,751,248,896), which is kind of cheating in my opinion, because if you do that, then Llama 2 is basically a 5B-6B param model.
Is the 2B measured like that as well? I did use it with llama.cpp and noticed it ran slower than I expected.

That's the danger of too much abstraction, it's easy to have big gaps in one's understanding of what's really going on.

Yes, it's somewhat similar to the 2B model as it uses the same vocabulary size.
Practically, speaking, I OOM'd running Gemma on a 3090 using a config that had VRAM to spare for Mistral 7B. It kinda surprised me at first, until I realized why.