| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dzr0001 71 days ago
	My token throughput is much better using vLLM-mlx on my M2 ultra than llama.cpp. It might be worth a shot to give it a try.