Hacker News new | ask | show | jobs
by asimovDev 105 days ago
I am running 80b Qwen coder next 4bit quant MLX version on a 96GB M3 MacBook and it responds quickly, almost immediately. I can fit the model + 128k context comfortably into the memory