| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dust42 43 days ago
	For many models the performance of llama.cpp on Mac is 20-40% lower than MLX. Did you try MLX? At least on HF there are MLX 2-bit quants. Unfortunately I have only 64GB, so I can't test it.

1 comments

I'm not using llama.cpp there, it's my inference engine that is DeepSeek v4 specific. The goal is to optimize it as much as possible.

That's cool!

I knew the name sounded familiar, thank you for SDS!