|
|
|
|
|
by kristianp
15 days ago
|
|
What quantisation do the creators intend this to be run at? They talk about 16GB of ram, so should it be run at 8 bit? People here are talking about using q4, but I would have thought a smaller model like this wouldn't perform well at such low bits per parameter. Edit, it looks like their bechmarks would have been done at 16 bit float, as the hugging face release is that size: https://huggingface.co/google/gemma-4-12B . Which is a little deceptive: they're advertising an 8 bit size will fit on 16GB laptops, while releasing a 16bit size. I guess we have to wait for someone to produce perplexity curves at different Q's. |
|