Hacker News new | ask | show | jobs
by PenguinRevolver 1042 days ago
I feel as if the cheapest way of running these kinds of models would be to have the whole cache/memory take space on the hard drive rather than the RAM. Then, you could just use CPU power instead of splurging out thousands for RAM & a GPU with enough VRAM.

It might or might not be reasonable speeds, but I would reason that it could avoid "sunk cost irony"; if you decide, that any point, Chat-GPT would have sufficed in your task. It's rare, but it can happen.

If you want to take this silly logic further, you can theoretically run any sized model on any computer. You could even attempt this dumb idea on a computer running Windows 95. I don't care how long it would take; if it takes seven and a half million years for 42 tokens, I would still call it a success!

1 comments

You are right about that being the cheapest, of course, in the sense that 64gb of HDD space is always going to be cheaper than RAM. But when you say

> thousands for RAM

I wonder if your perspective might be a little off - you can get 64GB DDR4 RAM for ~$100, it’s really not a big deal these days.

It’s a big deal on Mac, of course, where 64GB means big kitted out high-end model that costs thousands, but RAM really is that cheap.

Understandable; the reason I said "thousands for RAM" was because when I made that sentence, I put the theoretical RAM and GPU prices together. Oh well.
My apologies, I think the bit of context missing from my response is you don't need a GPU at all; 64GB of RAM will suffice to run a 70B model with your CPU, and it won't even be -that- slow, you'll get a few tokens per second.

So while a lot of us think that you need to splurge in order to get into LLMs, the reality is you don't, not really, and pretty much any computer will run any model, thanks to the efforts of projects like llama.cpp. Even using the disk like you mentioned! That's a thing, too. It's slower, but it's entirely possible.

If you're willing to drop down to the 7B/13B models, you'll need even less RAM (you can run 7B models with less than 8GB of RAM), and they'll run radically faster.

People have been working really hard to make it possible to run all these models on all sorts of different hardware, and I wouldn't be surprised if Llama 3 comes out in much bigger sizes than even the 70B, since hardware isn't as much of a limitation anymore.