Y
Hacker News
new
|
ask
|
show
|
jobs
by
irusensei
113 days ago
I noticed that even on my M3 MLX tends to do prefill it a lot faster than llama.cpp and GGML models. Anyone knows how they do it?