Y
Hacker News
new
|
ask
|
show
|
jobs
by
dust42
43 days ago
For many models the performance of llama.cpp on Mac is 20-40% lower than MLX. Did you try MLX? At least on HF there are MLX 2-bit quants. Unfortunately I have only 64GB, so I can't test it.
1 comments
antirez
43 days ago
I'm not using llama.cpp there, it's my inference engine that is DeepSeek v4 specific. The goal is to optimize it as much as possible.
link
oveja
38 days ago
That's cool!
I knew the name sounded familiar, thank you for SDS!
link