| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cafed00d 508 days ago

Absolutely! Even for inference! The SOTA models for all commercial purposes need to run on a consumer’s device.

Running either Grok2 or DeepSeek or even Llama405b requires nearly 400-500gb of memory.

Buying a tinybox with enough gpu memory costs $15k-25k. Or equivalently the same if you build your own.

A distributed Mac cluster costs about the same, if not more, if you’re buying 2-3 M2 Ultra each with 192gb of memory.

So people are absolutely constrained by price/supply here. Every engineer, analyst, scientist would be far more untethered by rules & regulations or policies & terms-of-service nitty gritties if they can trust that LLM they use is completely local, without-telemetry or tracking and is licensed fairly for commercial use (perhaps this excludes llama).

Not a lot of people can afford $15k-30k in spending for a computer (that can run this sota llms). But you can a billion will buy one when it’s $1k

2 comments

fragmede 508 days ago

Not to mention, the north star is to get to a place where we have the hardware to do training at home. we're a long ways off, but without the restrictions of needing the hardware to do it, ideally, we'd make the model such that is continually being trained.

link

radlad 508 days ago

Keep in mind too that to run DeepSeek R1 you'd need 768 GB, so essentially 4 tinyboxes.

link

fragmede 508 days ago

The full one, sure, but there are quantizations runnable on far smaller machines.

link

radlad 508 days ago

Quantized models don't perform as well. The question was about running one of the cloud models.

link

aurareturn 508 days ago

This was true before DeepSeek.

link