Hacker News new | ask | show | jobs
by Ambix 1143 days ago
It's not about threads number, it about memory bottleneck. Sweet spot for my M1 Pro laptop is around 6 threads and 4bit model - I've managed to get 20 tokens per sec, really impressive