|
|
|
|
|
by austinvhuang
842 days ago
|
|
Yes - thanks for pointing that out. The README is being updated, you can see an updated WIP in the dev branch: https://github.com/google/gemma.cpp/tree/dev?tab=readme-ov-f...
and improving error messages is a high priority. The weights should be the same across formats, but it's easy for differences to arise due to quantization and/or subtle implementation differences. Minor implementation differences has been a pain point in the ML ecosystem for a while (w/ IRs, onnx, python vs. runtime, etc.), but hopefully the differences aren't too significant (if they are, it's a bug in one of the implementations). There were quantization fixes like https://twitter.com/ggerganov/status/1760418864418934922 and other patches happening, but it may take a few days for patches to work their way through the ecosystem. |
|
I'm using the 32-bit GGUF model from the Google repo, not a different quantized model, so I could have one less source of error. It's hard to tell with LLMs if its a bug. It just gives slightly stranger answers sometimes, but it's not completely gibberish. or incoherent sentences or have extra punctuations like with some other LLM bugs I've seen.
Still, I'll wait a few days to build llama.cpp again to see if there are any changes.