Hacker News new | ask | show | jobs
by tgtweak 801 days ago
The extra $3k you'd spend on a quad-4090 rig vs the top mbp... ignoring the fact you can't put the two on even ground for versatility (very few libraries are adapted to apple silicone let alone optimized).

Very few people that would consider an H100/A100/A800 are going to be cross-shopping a macbook pro for their workloads.

1 comments

> very few libraries are adapted to apple silicone let alone optimized

This is a joke, right? Have you been anywhere in the LLM ecosystem for the past year or so? I'm constantly hearing about new ways in which ASi outperforms traditional platforms, and new projects that are optimized for ASi. Such as, for instance, llama.cpp.

Nothing compared to Nvidia though. The FLOPS and memory bandwidth is simply not there.
The memory bandwidth of the M2 Ultra is around 800GB/s verses 1008 GB/s for the 4090. While it’s true the M2 has neither the bandwidth or the GPU power, it is not limited to 24G of VRAM per card. The 192G upper limit on the M2 Ultra will have a much easier time running inference on a 70+ billion parameter model, if that is your aim.

Besides size, heat, fan noise, and not having to build it yourself, this is the only area where Apple Silicon might have advantage over a homemade 4090 rig.

It doesn't need GPU power to beat the 4090 in benchmarks: https://appleinsider.com/articles/23/12/13/apple-silicon-m3-...
It doesn't beat RTX 4090 when it comes to actual LLM inference speed. I bought a Mac Studio for local inference because it was the most convenient way to get something fast enough and with enough RAM to run even 155b models. It's great for that, but ultimately it's not magic - NVidia hardware still offers more FLOPS and faster RAM.
> It doesn't beat RTX 4090 when it comes to actual LLM inference speed

Sure, whisper.cpp is not an LLM. The 4090 can't even do inference at all on anything over 24GB, while ASi can chug through it even if slightly slower.

I wonder if with https://github.com/tinygrad/open-gpu-kernel-modules (the 4090 P2P patches) it might become a lot faster to split a too-large model across multiple 4090s and still outperform ASi (at least until someone at Apple does an MLX LLM).

Yeah. Let me just walk down to Best Buy and get myself a GPU with over 24 gigabytes of VRAM (impossible) for less than $3,000 (even more impossible). Then tell me ASi is nothing compared to Nvidia.

Even the A100 for something around $15,000 (edit: used to say $10,000) only goes up to 80 gigabytes of VRAM, but a 192GB Mac Studio goes for under $6,000.

Those figures alone proves Nvidia isn't even competing in the consumer or even the enthusiast space anymore. They know you'll buy their hardware if you really need it, so they aggressively segment the market with VRAM restrictions.

Where are you getting an A100 80GB for $10k?
Oops, I remembered it being somewhere near $15k but Google got confused and showed me results for the 40GB instead so I put $10k by mistake. Thanks for the correction.

A100 80GB goes for around $14,000 - $20,000 on eBay and A100 40GB goes for around $4,000 - $6,000. New (not from eBay - from PNY and such), it looks like an 80GB would set you back $18,000 to $26,000 depending on whether you want HBM2 or HBM2e.

Meanwhile you can buy a Mac Studio today without going through a distributor and they're under $6,000 if the only thing you care about is having 192GB of Unified Memory.

And while the memory bandwidth isn't quite as high as the 4090, the M-series chips can run certain models faster anyway, if Apple is to be believed