| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by abhashanand1501 5 days ago
	Do small language models run on cpus or you still need a gpus to run them?

3 comments

wongarsu 4 days ago

Anything below one billion parameters you can run on the CPU at acceptable speed

For larger sizes you still can, it just becomes slower and slower. For a simple classification task (small input, tiny output, and you can constrain output to a couple tokens) you could even run something like a 4B or 8B model on the CPU

link

a96 4 days ago

I guess that technically depends on the software used to run the model, but in general it's always been possible to run on a CPU (and may even be possible to run on TPU or something else). It's just been slower. Likewise GPU RAM vs system RAM and the bandwidths involved can make hard bottlenecks.

GPU and VRAM (or fast unified RAM) is generally the option that is both available and performant, but especially really small models also run quite well on CPU and system RAM.

link

avadodin 4 days ago

iGPUs are often slower or only as fast as CPUs when it comes to LLM text generation.

The advantage is mainly in memory bandwidth. External GPUs' internal memory is slightly faster than DDR attached to your CPU.

Other types of "AI" models do make use of the extra compute in GPUs but not LLMs.

link