Hacker News new | ask | show | jobs
by wolfgangK 297 days ago
Only those who don't care/know about prompt processing speed are buying Macs for LLM inference.
2 comments

Don't know and don't care are definitely things that I could be, but it also makes sense if they want to keep lookups private.
Even 40 tokens per second is plenty enough for real time usage. The average person reads at ~4 words per second, 40 tokens per second is going to be 15-20 words per second.

Even useful models like gemma3 27b are hitting 22 t/s on 4bit quants.

You aren't going to be reformatting gigabytes of PDFs or anything, but for a lot of common use cases, those speeds are fine.