Y
Hacker News
new
|
ask
|
show
|
jobs
by
asimovDev
105 days ago
I am running 80b Qwen coder next 4bit quant MLX version on a 96GB M3 MacBook and it responds quickly, almost immediately. I can fit the model + 128k context comfortably into the memory