Hacker News new | ask | show | jobs
by int_19h 1138 days ago
It's not even that bad. Core i7-12700K with DDR5 gives me ~1 word per second on llama-30b - that is fast enough for real-time chat, with some patience. And things are even better on M1/M2 Macs.
1 comments

The critical factor seems to be the ability to fit the whole model in RAM (--mlock option in oobabooga). With Apple's RAM prices most M1/M2 owners probably don't have the 32 GB RAM required to fit a 4bit 30B model.
I have 64 GB RAM, but only a Ryzen 5 3600, and the larger models are very slow ;)