| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by petercooper 8 days ago
	Its image processing is terrible. I ran several tests against it against Qwen 3.5 0.8b (yes, 7% the size) and Qwen beat it every time with Gemma often getting things entirely wrong. I even gave it a plain image saying "This is a test" and it thought for 6 minutes trying to analyze it and failed. Qwen 3.5 0.8b confidently got it in under a second. It may be that the Q6 quant I got is borked (or my LM Studio is), but either way, the 0.8b's performance is mind boggling in comparison.

6 comments

CMay 8 days ago

For Qwen 3.5 0.8B presumably you're running it unquantized, because it's so small. Get at least the Q8 of Gemma 4 12B with the F32 mmproj and use an f16 kv cache.

Then run it with the latest llama.cpp that contains the Gemma 4 12B unified bug fixes, using --image-min-tokens 560 --image-max-tokens 2240 --batch-size 4096 --ubatch-size 4096 --temp 1.0 --top-p 0.95 --top-k 64 --jinja

It's understanding far more complex things for me and can reliably handle tiny text, so it should be easily understanding an image that only contains the text "This is a test".

link

usef- 8 days ago

That sounds like a bug. They're very common for open model releases on the first day. If I wasn't on mobile I'd try it on Google's own app.

link

JacobAsmuth 7 days ago

Sounds like you're doing it wrong, to be honest.

link

ma2kx 8 days ago

I guess Google implements more / stronger guard rails than Alibaba and thus confuses these small models. At least this was my impression with Gemma3 models where it often said that the image contains some nudity / sex scenes and therefore it cannot give a description of the image. Never understood the point of this behavior....

link

jimmy76615 8 days ago

The biggest problem with all the Google models has always been RLHF, particularly safety training. They take a good, smart model and make it behave like a corporate person that has been to far to many forced anti-{sexism, racism...} seminars so that it is now living in fear of saying something that could be construed as wrong by some moral standard.

link

staticman2 8 days ago

This is almost certainly not true.

If it was, they wouldn't need to be using the classifiers they are using to warn Gemini about problematic prompts.

link

thot_experiment 8 days ago

I've always found the Gemma models to vastly under-perform on vision tasks compared to Qwen so that's nothing new.

link

mountainriver 8 days ago

The Qwen series adopted vision wayyy earlier than anyone else. No idea why the other labs were sleeping on it but they had about 2 years of experimentation without any competition.

link

staticman2 7 days ago

Test it on a professional inference provider to rule out trouble on your end.

link