|
|
|
|
|
by accrual
19 days ago
|
|
Splendid model, it reminds me of Gemma3 27B which was my favorite local model last year. Gemma always had a bit more warmth/empathy compared to Qwen and Mistral in my experience and I found it more useful for personal questions. My system has a 4080 Super (16GB) installed and using llama.cpp (b9333-35c9b1f39) I got these results on a test prompt: * Qwen3.5-9B-Q6_K.gguf - Prompt: 1492.0 t/s | Generation: 81.0 t/s * gemma-4-12b-it-Q4_K_M.gguf - Prompt: 1329.2 t/s | Generation: 72.3 t/s * gemma-4-12b-it-Q8_0.gguf - Prompt: 504.4 t/s | Generation: 25.2 t/s |
|