|
|
|
|
|
by egorfine
59 days ago
|
|
I was comparing various models at M5 Pro 48GB RAM MLX vs GGUF and found that MLX models have a higher time to first token (sometimes by an order of magnitude) while tokens/sec and memory usage is same as GGUF. Gemma 3 27B q4: * MLX: 16.7 t/s, 1220ms ttft * GGUF: 16.4 t/s, 760ms ttft Gemma 4 31B q8: * MLX: 8.3 t/s, 25000ms ttft * GGUF: 8.4 t/s, 1140ms ttft Gemma 4 A4B q8: * MLX: 52 t/s, 1790ms ttft * GGUF: 51 t/s, 380ms ttft All comparisons done in LM Studio, all versions of everything are the latest. |
|