| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Semaphor 1138 days ago
	> The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM. Bigger ones as well, you just have to wait longer. Nothing for real time usage, but if you can wait 10-20 minutes, you can use them on CPU.

1 comments

int_19h 1138 days ago

It's not even that bad. Core i7-12700K with DDR5 gives me ~1 word per second on llama-30b - that is fast enough for real-time chat, with some patience. And things are even better on M1/M2 Macs.

link

Joeri 1138 days ago

The critical factor seems to be the ability to fit the whole model in RAM (--mlock option in oobabooga). With Apple's RAM prices most M1/M2 owners probably don't have the 32 GB RAM required to fit a 4bit 30B model.

link

Semaphor 1138 days ago

I have 64 GB RAM, but only a Ryzen 5 3600, and the larger models are very slow ;)

link