Gemma 7B is a 9B model. The name is a lie. Then they really played games with Gemma 2B as well.
I don't get how Google can be this incompetent and far behind everyone else. They have amazing people and the kinds of resources that almost no one else does but somehow need to resort to faking demos, blatant lies about model sizes, etc.
Google used to be the place everyone wanted to go. Someone at Google AI needs to be fired so they can start being productive again.
Haha really? I almost added the caveat that I didn't count the parameters myself. And I couldn't see the weights file size because it requires login (because of their restrictive licensing choice). If true and it's 9B that's really dishonest.
Yes, it's 8.5B params if you account for weight tying, and 9.3B if you count the embedding layer and output layer weights separately as shown in the 2nd figure in the article. In the paper, I think they justified 7B by only counting the non-embedding parameters (7,751,248,896), which is kind of cheating in my opinion, because if you do that, then Llama 2 is basically a 5B-6B param model.
Practically, speaking, I OOM'd running Gemma on a 3090 using a config that had VRAM to spare for Mistral 7B. It kinda surprised me at first, until I realized why.
Gemma 7B is a 9B model. The name is a lie. Then they really played games with Gemma 2B as well.
I don't get how Google can be this incompetent and far behind everyone else. They have amazing people and the kinds of resources that almost no one else does but somehow need to resort to faking demos, blatant lies about model sizes, etc.
Google used to be the place everyone wanted to go. Someone at Google AI needs to be fired so they can start being productive again.