|
|
|
|
|
by heinrichf
461 days ago
|
|
I'm comparing Gemma3 12 B (https://ollama.com/library/gemma3; running fully on my 3060 12GB) and Mistral Small 3 24B (https://ollama.com/library/mistral-small; 10% offloaded to the CPU). - Gemma3 12B: ~100 t/s on prompt eval; 15 t/s on eval - MistralSmall3 24B: ~500 t/s on prompt eval; 10 t/s on eval Do you know what different in architecture could make the prompt eval (prefill) so much slower on the 2x smaller Gemma3 model? |
|