Hacker News new | ask | show | jobs
by dust42 43 days ago
For many models the performance of llama.cpp on Mac is 20-40% lower than MLX. Did you try MLX? At least on HF there are MLX 2-bit quants. Unfortunately I have only 64GB, so I can't test it.
1 comments

I'm not using llama.cpp there, it's my inference engine that is DeepSeek v4 specific. The goal is to optimize it as much as possible.
That's cool!

I knew the name sounded familiar, thank you for SDS!