Hacker News new | ask | show | jobs
by accrual 19 days ago
Splendid model, it reminds me of Gemma3 27B which was my favorite local model last year. Gemma always had a bit more warmth/empathy compared to Qwen and Mistral in my experience and I found it more useful for personal questions.

My system has a 4080 Super (16GB) installed and using llama.cpp (b9333-35c9b1f39) I got these results on a test prompt:

* Qwen3.5-9B-Q6_K.gguf - Prompt: 1492.0 t/s | Generation: 81.0 t/s

* gemma-4-12b-it-Q4_K_M.gguf - Prompt: 1329.2 t/s | Generation: 72.3 t/s

* gemma-4-12b-it-Q8_0.gguf - Prompt: 504.4 t/s | Generation: 25.2 t/s