Y
Hacker News
new
|
ask
|
show
|
jobs
by
Ambix
1143 days ago
It's not about threads number, it about memory bottleneck. Sweet spot for my M1 Pro laptop is around 6 threads and 4bit model - I've managed to get 20 tokens per sec, really impressive