|
|
|
|
|
by benbojangles
25 days ago
|
|
Gemma4 because presumably it does image analysis right? -31b It's a dense model -how many tokens/s is it running at -What temps are the M1 max GPU/CPU running at -Is it mlx or gguf -Why 31b and not 26b which is moe and much more efficient on the m1 max at 50tokens/s & low temps. I personally use (MLX) qwen3.6-35b-8bit mostly, but use Gemma-4-26b-4bit for image analysis, its mind blowing how fast it is at identifying the scene in a photograph. |
|