Hacker News new | ask | show | jobs
by alecco 3 hours ago
MacBooks have lots of RAM and no PCIe bottleneck, but ~10x fewer FLOP/s than a much cheaper Nvidia GPU. Test LLMs on rented GPUs on vast.ai or other similar services (beware storage etc). Don't spend thousands before trying and knowing exactly what you get.

Also beware local models tend to be slow. Also, the main optimization trick for LLM inference is running large batches (concurrent users) and you won't take advantage of this (batch=1).

IMHO using Macs for LLMs is a fad. An expensive fad.