|
|
|
|
|
by rasbt
848 days ago
|
|
> Gemma is a +9B model Yes that's correct. It's 9.3B parameters if you count the embedding layer and final projection layer separately. However, since they used weight tying, the adjusted count is 8.5B as discussed in the article. |
|