| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by leminimal 850 days ago

Kudos on your release! I know this was just made available but

- Somewhere the README, consider adding the need for a `-DWEIGHT_TYPE=hwy::bfloat16_t` flag for non-sfp. Maybe around step 3.

- The README should explicitly say somehere that there's no GPU support (at the moment)

- "Failed to read cache gating_ein_0 (error 294)" is pretty obscure. I think even "(error at line number 294)" would be a big improvement when it fails to FindKey.

- There's something odd about the 2b vs 7b model. The 2b will claim its trained by Google but the 7b won't. Were these trained on the same data?

- Are the .sbs weights the same weights as the GGUF? I'm getting different answers compared to llama.cpp. Do you know of a good way to compare the two? Any way to make both deterministic? Or even dump probability distributions on the first (or any) token to compare?

1 comments

austinvhuang 850 days ago

Yes - thanks for pointing that out. The README is being updated, you can see an updated WIP in the dev branch: https://github.com/google/gemma.cpp/tree/dev?tab=readme-ov-f... and improving error messages is a high priority.

The weights should be the same across formats, but it's easy for differences to arise due to quantization and/or subtle implementation differences. Minor implementation differences has been a pain point in the ML ecosystem for a while (w/ IRs, onnx, python vs. runtime, etc.), but hopefully the differences aren't too significant (if they are, it's a bug in one of the implementations).

There were quantization fixes like https://twitter.com/ggerganov/status/1760418864418934922 and other patches happening, but it may take a few days for patches to work their way through the ecosystem.

leminimal 848 days ago

Thanks, I'm glad to see your time machine caught my comment.

I'm using the 32-bit GGUF model from the Google repo, not a different quantized model, so I could have one less source of error. It's hard to tell with LLMs if its a bug. It just gives slightly stranger answers sometimes, but it's not completely gibberish. or incoherent sentences or have extra punctuations like with some other LLM bugs I've seen.

Still, I'll wait a few days to build llama.cpp again to see if there are any changes.