Hacker News new | ask | show | jobs
by benbojangles 25 days ago
Gemma4 because presumably it does image analysis right?

-31b It's a dense model

-how many tokens/s is it running at

-What temps are the M1 max GPU/CPU running at

-Is it mlx or gguf

-Why 31b and not 26b which is moe and much more efficient on the m1 max at 50tokens/s & low temps.

I personally use (MLX) qwen3.6-35b-8bit mostly, but use Gemma-4-26b-4bit for image analysis, its mind blowing how fast it is at identifying the scene in a photograph.