Hacker News new | ask | show | jobs
by yobanate 914 days ago
Can confirm. My M3 Max gets about 22t/s, putting the bottleneck BKAC.
1 comments

That's 10x speed increase. What's the secret behind apple M3? Faster clocked RAMs? Specific AI hardware?
Unified memory and optimizations in llama.cpp (which Ollama wraps).
Is that using the GPU?
It can be variably configured. There are details in the repo, but llama.cpp makes use of Metal.