| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by smith7018 475 days ago
	You can build an x86 machine that can fully run DeepSeek R1 with 512GB VRAM for ~$2,500?

2 comments

ta988 475 days ago

You will have to explain to me how.

link

bmelton 475 days ago

https://digitalspaceport.com/how-to-run-deepseek-r1-671b-ful...

link

muricula 475 days ago

Is that a CPU based inference build? Shouldn't you be able to get more performance out of the M3's GPU?

link

wmf 475 days ago

Inference is about memory bandwidth and some CPUs have just as much bandwidth as a GPU.

link

radlad 475 days ago

https://news.ycombinator.com/item?id=42897205

link

hbbio 475 days ago

How would you compare the tok/sec between this setup and the M3 Max?

link

aurareturn 475 days ago

3.5 - 4.5 tokens/s on the $2,000 AMD Epyc setup. Deepseek 671b q4.

The AMD Epyc build is severely bandwidth and compute constrained.

~40 tokens/s on M3 Ultra 512GB by my calculation.

link

wolfgangK 475 days ago

IMO, it would be more interesting to have a 3-way comparison of price/performance between DeepSeek 671b running on :

1. M3 Ultra 512 2. AMD Epyc (which Gen ? AVX512 and DDR5 might make a difference in both performance and cost , Gen 4 or Gen 5 have 8 or 9 t/s https://github.com/ggml-org/llama.cpp/discussions/11733 ) 2. AMD Epyc + 4090 or 5090 running KTransformers (over 10 t/s decode ? https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...)

link

hbbio 475 days ago

Thanks!

If the M3 can run 24/7 without overheating it's a great deal to run agents. Especially considering that it should run only using 350W... so roughly $50/mo in electricity costs.

link

aenis 475 days ago

Out of curiosity, if you dont mind: what kind of an agent would you run 24/7 locally?

I'd assume this thing peaks at 350W (or whatever) but idles at around 40w tops?

link

MBCook 475 days ago

I’m guessing they might be thinking long training jobs as opposed to model use in an end product if done sort.

link

sgt 475 days ago

What kind of Nvidia-based rig would one need to achieve 40 tokens/sec on Deepseek 671b? And how much would it cost?

link

aurareturn 475 days ago

Around 5x Nvidia A100 80GB can fit 671b Q4. $50k just for the GPUs and likely much more when including cooling, power, motherboard, CPU, system RAM, etc.

link

sgt 475 days ago

So the M3 Ultra is amazing value then. And from what I could tell, an equivalent AMD Epyc would still be so constrained that we're talking 4-5 tokens/s. Is this a fair assumption?

link

adgjlsfhk1 475 days ago

No. The advantage of Epic is you get 12 channels of ram so it should be ~6x faster than a consumer cpu.

link

Aeolun 475 days ago

The Epyc would only set you back $2000 though, so it’s only a slightly worse price/return.

link

SkiFire13 475 days ago

How many tokens/s would that be though?

link

sgt 475 days ago

That's what I'm trying to get to. Looking to set up a rig, and AMD Epyc seems reasonable but I'd rather go Mac if it's giving many more tokens per second. It does sound like the Mac with M3 Ultra will easily give 40 tokens/s, where as the Epyc is just internally constrained too much, giving 4-5 tokens/s but I'd like someone to confirm that, instead of buying the HW and finding out myself. :)

link

aurareturn 474 days ago

Probably a lot more. Those are server-grade GPUs. We're talking prosumer grade Macs.

I don't know how to calculate tokens/s for H100s linked together. ChatGPT might help you though. :)

link