Y
Hacker News
new
|
ask
|
show
|
jobs
by
dzr0001
24 days ago
My token throughput is much better using vLLM-mlx on my M2 ultra than llama.cpp. It might be worth a shot to give it a try.