Hacker News new | ask | show | jobs
by mark_l_watson 941 days ago
Another data point: I can (barely) run a 30B 4 bit quantized model on a Mac Mini with 32G on chip memory but it runs slowly (a little less than 10 tokens/second).

13B and 7B models run easily and much faster.