| HN Mirror

My apologies, I think the bit of context missing from my response is you don't need a GPU at all; 64GB of RAM will suffice to run a 70B model with your CPU, and it won't even be -that- slow, you'll get a few tokens per second.

So while a lot of us think that you need to splurge in order to get into LLMs, the reality is you don't, not really, and pretty much any computer will run any model, thanks to the efforts of projects like llama.cpp. Even using the disk like you mentioned! That's a thing, too. It's slower, but it's entirely possible.

If you're willing to drop down to the 7B/13B models, you'll need even less RAM (you can run 7B models with less than 8GB of RAM), and they'll run radically faster.

People have been working really hard to make it possible to run all these models on all sorts of different hardware, and I wouldn't be surprised if Llama 3 comes out in much bigger sizes than even the 70B, since hardware isn't as much of a limitation anymore.