Hacker News new | ask | show | jobs
by Gracana 478 days ago
It's pretty close. A 3090 or 4090 has about 1TB/s of memory bandwidth, while the top Apple chips have a bit over 800GB/s. Where you'll see a big difference is in prompt processing. Without the compute power of a pile of GPUs, chewing through long prompts, code, documents etc is going to be slower.
1 comments

nobody in industry is using a 4090, they are using H100s which have 3TB/s. Apple also doesn’t have any equivalent to nvlink.

I agree that compute is likely to become the bottleneck for these new Apple chips, given they only have like ~0.1% the number of flops

I chose the 3090/4090 because it seems to me that this machine could be a replacement for a workstation or a homelab rig at a similar price point, but not a $100-250k server in a datacenter. It's not really surprising or interesting that the datacenter GPUs are superior.

FWIW I went the route of "bunch of GPUs in a desktop case" because I felt having the compute oomph was worth it.

4.8TB/s on H200, 8TB/s on B200, pretty insane.
That’s wild, somehow I hadn’t seen the B200 specs before now. I wish I could have even a fraction of that!