| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by losvedir 592 days ago
	I'm curious about getting one of these to run LLM models locally, but I don't understand the cost benefit very well. Even 128GB can't run, like, a state of the art Claude 3.5 or GPT 4o model right? Conversely, even 16GB can (I think?) run a smaller, quantized Llama model. What's the sweet spot for running a capable model locally (and likely future local-scale models)?

3 comments

brandall10 592 days ago

You'll be able to run 72B models w/ large context, lightly quantized with decent'ish performance, like 20-25 tok/sec. The best of the bunch are maybe 90% of a Claude 3.5.

If you need to do some work offline, or for some reason the place you work blocks access to cloud providers, it's not a bad way to go, really. Note that if you're on battery, heavy LLM use can kill your battery in an hour.

link

SkyMarshal 592 days ago

Lots of discussion and testing of that over on https://www.reddit.com/r/LocalLLaMA/, worth following if you're not already.

link

bufferoverflow 592 days ago

Claude 3.5 and GPT 4o are huge models. They don't run on consumer hardware.

link