Hacker News new | ask | show | jobs
by cafed00d 508 days ago
Absolutely! Even for inference! The SOTA models for all commercial purposes need to run on a consumer’s device.

Running either Grok2 or DeepSeek or even Llama405b requires nearly 400-500gb of memory.

Buying a tinybox with enough gpu memory costs $15k-25k. Or equivalently the same if you build your own.

A distributed Mac cluster costs about the same, if not more, if you’re buying 2-3 M2 Ultra each with 192gb of memory.

So people are absolutely constrained by price/supply here. Every engineer, analyst, scientist would be far more untethered by rules & regulations or policies & terms-of-service nitty gritties if they can trust that LLM they use is completely local, without-telemetry or tracking and is licensed fairly for commercial use (perhaps this excludes llama).

Not a lot of people can afford $15k-30k in spending for a computer (that can run this sota llms). But you can a billion will buy one when it’s $1k

2 comments

Not to mention, the north star is to get to a place where we have the hardware to do training at home. we're a long ways off, but without the restrictions of needing the hardware to do it, ideally, we'd make the model such that is continually being trained.
Keep in mind too that to run DeepSeek R1 you'd need 768 GB, so essentially 4 tinyboxes.
The full one, sure, but there are quantizations runnable on far smaller machines.
Quantized models don't perform as well. The question was about running one of the cloud models.
This was true before DeepSeek.