| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fakebizprez 69 days ago

Wrong.

If a model can run on a 512GB M3 Ultra via MLX or CUDA, but simultaneously benefit from the memory bandwidth of something like an RTX 6000 Pro; that would save my company hundreds of thousands of dollars. That's $20,000 for roughly 600GB of VRAM, and enough token generation speed to fulfill the needs of any enterprise that's not a hyperscaler or neocloud.

I'll let someone else do the math for you on what it costs to put together a 10U server to get that kind of performance without the $10K M3 Ultra Studio.

What we're paying for five old 80GB A100s is criminal, but it's nothing compared to what these GB200 Blackwell setups are going to cost in 2030. Market economics aside, the fact that they require sophisticated liquid cooling infrastructure and draw 3x the power of the A100s, will make these cards unattainable for small to medium organizations.

So yeah, if there's some outside chance that we can pair NVIDIA's speed with a an arm-powered machine that offers 512GB Unified Memory while drawing 50W -- you better believe it's a big deal. We'll see. Sounds too good to be true.