Hacker News new | ask | show | jobs
by irusensei 113 days ago
I noticed that even on my M3 MLX tends to do prefill it a lot faster than llama.cpp and GGML models. Anyone knows how they do it?