For enterprise markets, this is table stakes. A lot of datacenter customers will probably ignore this release altogether since there isn't a high-bandwidth option for systems interconnect.
The Mac Studio isn’t meant for data centers anyway? It’s a small and silent desktop form factor — in every respect the opposite of a design you’d want to put in a rack.
A long time ago Apple had a rackmount server called Xserve, but there’s no sign that they’re interested in updating that for the AI age.
Don't forget CI/CD farms for iOS builds, although I think it's much more cost effective to just make Minis or Studios work, despite their nonstandard formfactor
I genuinely forgot the Mac Pro still exists. It’s been so long since I even saw one.
And I’ve had every previous Mac tower design since 1999: G4, G5, the excellent dual Xeon, the horrible black trash can… But Apple Silicon delivers so much punch in the Studio form factor, the old school Pro has become very niche.
Edit - looks like the new M3 Ultra is only available in Mac Studio anyway? So the existence of the Pro is moot here.
The Mac Studio hit a sweet spot in 2023 that the trash can Mac Pro couldn't ten years earlier. It's mostly thanks to the high integration of Apple Silicon and improved device availability and speed of Thunderbolt.
The 2013 Mac Pro was stuck forever with its original choice of Intel CPU and AMD GPU. And it was unfortunately prone to overheating due to these same components.
Apple recently announced they’re building a new plant in Texas to produce servers. Yes, they need servers for their Private Compute Cloud used by Apple Intelligence, but it doesn’t only need to be for that.
As part of its new U.S. investments, Apple will work with manufacturing partners to begin production of servers in Houston later this year. A 250,000-square-foot server manufacturing facility, slated to open in 2026, will create thousands of jobs.
That's still not even competitive with 100G Ethernet on a per-port basis. An overall bandwidth of 480 Gbps pales in comparison with, for example, the 3200 Gbps you get with a P5 instance on EC2.
But 80Gbit/s is way slower than even regular dual channel RAM, or am I missing something here? That would mean the LLM would be excruciatingly slow. You could get an old EPYC for a fraction of that price and have more performance.
If I'm not mistaken, each token produced roughly equals the whole model in memory transfers (the exception being MoE models). That's why memory bandwidth is so important in the first place, or not?
My understanding is that if you can store 1/Nth of the weights in RAM on each of the N nodes then there's no need to send the weights over the network.
You're correct about the weights: each machine could in fact store all of the weights. However I think you still have to transfer the activations and the KV-Cache while performing inference.
It's good enough to run whatever local model you want. 2x 80core GPU is no joke. Linking them together gives it effectively 1.6 TB/s of bandwidth. 1TB of total memory.
You can run the full Deepseek 671b q8 model at 40 tokens/s. Q4 model at 80 tokens/s. 37B active params at a time because R1 is MoE.
Linking 2 of these together let's you run a model more capable (R1) than GPT4o at a comfortable speed at home. That was simply fantasy a year ago.
> with a vanishingly small fraction of flops and a small fraction of memory bandwidth
Is it though?
Wikipedia says [1] an M3 Max can do 14 TFLOPS of FP32, so an M3 Ultra ought to do 28 TFLOPS. nVidia claims [2] a Blackwell GPU does 80 TFLOPs of FP32. So M3 Ultra is 1/3 the speed of a Blackwell.
Calling that "a vanishingly small fraction" seems like a bit of an exaggeration.
I mean, by that metric, a single Blackwell GPU only has "a vanishingly small fraction" of the memory of an M3 Ultra. And the M3 Ultra is only burning "a vanishingly small fraction" of a Blackwell's electrical power.
nVidia likes throwing around numbers like "20 petaFLOPs" for FP4, but that's not real floating point... it's just 1990's-vintage uLaw/aLaw integer math.
Edit: Further, most (all?) of the TFLOPs numbers you see on nVidia datasheets for "Tensor FLOPs" have a little asterisk next to them saying they are "effective" TFLOPs using the sparsity feature, where half the elements of the matrix multiplication are zeroed.
I wonder if that’s something new, or just the same virtual network interface that’s been around since the TB1 days (a new network interface appears when you connect two Macs with a TB cable)
I'm super interested in the clustering capability. At launch people said they were only getting like 11Gbps from their TB4 drive arrays, which was really way less than expected.
Apple does kind of advertise that each TB port has its own controllers. Which gives me hope that whatever 1x port can do 6x can do 6x better.
AMD's Strix Halo victory feels much more shallow today. Eventually 48GB or 64GB sticks will probably expand Strix Halo to 192 then 256GB. But Strix Halo is super super io starved, is basically a desktop of IO, with no way to easily host-to-host, and Apple absolutely understands that the use of a chip is bounded by what it can connect to. 6x TB5, if even half true, will be utterly outstanding.
It's been so so so so cool to see Non-Transparent Bridging atop thunderbolt, so one host can act like a device. Since it's PCIe, that hypothetically would allow amazing RDMA over TB. USB4 mandates host to host networking, but I have no idea how it is implemented and I suspect it's no where near as close to the metal.
In 2017 I was working for a company that was trying to develop foundation models and I was developing a framework for training what were then large neural network [1] and other models.
It was "yet another mac-oriented startup" but I had them get me an Alienware laptop because I could get one with a 1070 mobile card that meant I could train on my laptop whereas the data sci's had to do everything on our DGX-1. [2]
Today it is the other way around, the Mac Studio looks like the best AI development workstation you can get.
[1] I was really partial to a character-level CNN model we had
[2] CEO presented next to Jensen Huang at a NVIDIA conference, his favorite word was "incredible". I thought it was "incredible" when I heard they got bought by Nike, but it was true.
A long time ago Apple had a rackmount server called Xserve, but there’s no sign that they’re interested in updating that for the AI age.