| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bigyabai 476 days ago
	For enterprise markets, this is table stakes. A lot of datacenter customers will probably ignore this release altogether since there isn't a high-bandwidth option for systems interconnect.

4 comments

pavlov 476 days ago

The Mac Studio isn’t meant for data centers anyway? It’s a small and silent desktop form factor — in every respect the opposite of a design you’d want to put in a rack.

A long time ago Apple had a rackmount server called Xserve, but there’s no sign that they’re interested in updating that for the AI age.

link

bigyabai 476 days ago

It's the Ultra chip, the same one that goes into the rackmount Mac Pro. I don't think there's much confusion as to who this is for.

> there’s no sign that they’re interested in updating that for the AI age.

https://security.apple.com/blog/private-cloud-compute/

link

wtallis 476 days ago

The rackmount Mac Pro is for A/V studios, not datacenters.

link

phillco 476 days ago

Don't forget CI/CD farms for iOS builds, although I think it's much more cost effective to just make Minis or Studios work, despite their nonstandard formfactor

link

kridsdale1 475 days ago

Google and Facebook have vast fleets of Minis in custom chassis for this purpose.

link

pavlov 476 days ago

I genuinely forgot the Mac Pro still exists. It’s been so long since I even saw one.

And I’ve had every previous Mac tower design since 1999: G4, G5, the excellent dual Xeon, the horrible black trash can… But Apple Silicon delivers so much punch in the Studio form factor, the old school Pro has become very niche.

Edit - looks like the new M3 Ultra is only available in Mac Studio anyway? So the existence of the Pro is moot here.

link

choilive 475 days ago

never understood the hate on the trash can. Isn't the mac studio basically the same idea as the trash can but even less upgradeable?

link

pavlov 475 days ago

The Mac Studio hit a sweet spot in 2023 that the trash can Mac Pro couldn't ten years earlier. It's mostly thanks to the high integration of Apple Silicon and improved device availability and speed of Thunderbolt.

The 2013 Mac Pro was stuck forever with its original choice of Intel CPU and AMD GPU. And it was unfortunately prone to overheating due to these same components.

link

wtallis 475 days ago

The trash can also suffered from hitting the market right around when the industry gave up on making dual-GPU work.

link

pjmlp 475 days ago

Folks that want to keep the customisation aspect of Mac Pro hardly see that.

In fact a very famous podcaster is still holding out to his.

link

TylerE 475 days ago

The Studio also hits a sweet spot for home users like me that want tons of IO and no built in input devices.

link

Alupis 476 days ago

Outside of extremely niche use cases, who is racking apple products in 2025?

link

nordsieck 476 days ago

There's MacMiniVault (nee MacMiniColo) https://www.macminivault.com/

Not sure if they count as niche or not.

link

kube-system 476 days ago

Every provider who offers MacOS in the cloud.

link

Alupis 475 days ago

So MacOS is still not allowed to be virtualized per the EULA? Wow if that's true...

link

kube-system 475 days ago

MacOS is permitted to be virtualized... as long as the host is a Mac. :)

link

wpm 476 days ago

AWS

link

waveringana 476 days ago

github for their macos runners (pretty sure theyre m1 minis)

link

alwillis 475 days ago

Apple recently announced they’re building a new plant in Texas to produce servers. Yes, they need servers for their Private Compute Cloud used by Apple Intelligence, but it doesn’t only need to be for that.

From https://www.apple.com/newsroom/2025/02/apple-will-spend-more...

As part of its new U.S. investments, Apple will work with manufacturing partners to begin production of servers in Houston later this year. A 250,000-square-foot server manufacturing facility, slated to open in 2026, will create thousands of jobs.

link

phonon 476 days ago

Thunderbolt 5 can do bi-directional 80 Gbps....and Mac Studio Ultra has 6 ports...

link

cibyr 476 days ago

That's still not even competitive with 100G Ethernet on a per-port basis. An overall bandwidth of 480 Gbps pales in comparison with, for example, the 3200 Gbps you get with a P5 instance on EC2.

link

phonon 475 days ago

A 3 year reservation of a P5 is over a million dollars though? Not sure how that's comparable....

link

nyrikki 476 days ago

To add to this GPU servers like supermicro have a 400GBe port per GPU plus more for the CPU.

link

kridsdale1 475 days ago

Cost competitive though?

link

spiderfarmer 476 days ago

You can use Thunderbolt 5 interconnect (80Gbps) to run LLMs distributed across 4 or 5 Mac Studios.

link

atwrk 476 days ago

But 80Gbit/s is way slower than even regular dual channel RAM, or am I missing something here? That would mean the LLM would be excruciatingly slow. You could get an old EPYC for a fraction of that price and have more performance.

link

wmf 476 days ago

The weights don't go over the network so performance is OK.

link

atwrk 476 days ago

If I'm not mistaken, each token produced roughly equals the whole model in memory transfers (the exception being MoE models). That's why memory bandwidth is so important in the first place, or not?

link

wmf 475 days ago

My understanding is that if you can store 1/Nth of the weights in RAM on each of the N nodes then there's no need to send the weights over the network.

link

unsatchmo 475 days ago

You're correct about the weights: each machine could in fact store all of the weights. However I think you still have to transfer the activations and the KV-Cache while performing inference.

link

whimsicalism 476 days ago

why would you ever want to do that remains an open question

link

aurareturn 476 days ago

Probably some kind of local LLM server. 1TB of 1.6 TB/s memory if you link 2 together. $20k total. Half the price of a single Blackwell chip.

link

whimsicalism 476 days ago

with a vanishingly small fraction of flops and a small fraction of memory bandwidth

link

aurareturn 476 days ago

It's good enough to run whatever local model you want. 2x 80core GPU is no joke. Linking them together gives it effectively 1.6 TB/s of bandwidth. 1TB of total memory.

You can run the full Deepseek 671b q8 model at 40 tokens/s. Q4 model at 80 tokens/s. 37B active params at a time because R1 is MoE.

Linking 2 of these together let's you run a model more capable (R1) than GPT4o at a comfortable speed at home. That was simply fantasy a year ago.

link

burnerthrow008 475 days ago

> with a vanishingly small fraction of flops and a small fraction of memory bandwidth

Is it though?

Wikipedia says [1] an M3 Max can do 14 TFLOPS of FP32, so an M3 Ultra ought to do 28 TFLOPS. nVidia claims [2] a Blackwell GPU does 80 TFLOPs of FP32. So M3 Ultra is 1/3 the speed of a Blackwell.

Calling that "a vanishingly small fraction" seems like a bit of an exaggeration.

I mean, by that metric, a single Blackwell GPU only has "a vanishingly small fraction" of the memory of an M3 Ultra. And the M3 Ultra is only burning "a vanishingly small fraction" of a Blackwell's electrical power.

nVidia likes throwing around numbers like "20 petaFLOPs" for FP4, but that's not real floating point... it's just 1990's-vintage uLaw/aLaw integer math.

[1] https://en.wikipedia.org/wiki/Apple_silicon#Comparison_of_M-...

[2] https://resources.nvidia.com/en-us-blackwell-architecture/da...

Edit: Further, most (all?) of the TFLOPs numbers you see on nVidia datasheets for "Tensor FLOPs" have a little asterisk next to them saying they are "effective" TFLOPs using the sparsity feature, where half the elements of the matrix multiplication are zeroed.

link

whimsicalism 474 days ago

TFLOPS are teraflops not “tensor flops”.

Blackwell and modern AI chips are built for fp16. B100 has 1750 tflops of fp16. M3 ultra has ~80tflops of fp16 or about 4% that of b100

link

PaulHoule 476 days ago

That article says you can connect them through the Thunderbolt 5 somehow to form clusters.

link

burnerthrow008 476 days ago

I wonder if that’s something new, or just the same virtual network interface that’s been around since the TB1 days (a new network interface appears when you connect two Macs with a TB cable)

link

jauntywundrkind 475 days ago

Its the same host-to-host usb network, I believe.

I'm super interested in the clustering capability. At launch people said they were only getting like 11Gbps from their TB4 drive arrays, which was really way less than expected.

Apple does kind of advertise that each TB port has its own controllers. Which gives me hope that whatever 1x port can do 6x can do 6x better.

AMD's Strix Halo victory feels much more shallow today. Eventually 48GB or 64GB sticks will probably expand Strix Halo to 192 then 256GB. But Strix Halo is super super io starved, is basically a desktop of IO, with no way to easily host-to-host, and Apple absolutely understands that the use of a chip is bounded by what it can connect to. 6x TB5, if even half true, will be utterly outstanding.

It's been so so so so cool to see Non-Transparent Bridging atop thunderbolt, so one host can act like a device. Since it's PCIe, that hypothetically would allow amazing RDMA over TB. USB4 mandates host to host networking, but I have no idea how it is implemented and I suspect it's no where near as close to the metal.

link

PaulHoule 475 days ago

In 2017 I was working for a company that was trying to develop foundation models and I was developing a framework for training what were then large neural network [1] and other models.

It was "yet another mac-oriented startup" but I had them get me an Alienware laptop because I could get one with a 1070 mobile card that meant I could train on my laptop whereas the data sci's had to do everything on our DGX-1. [2]

Today it is the other way around, the Mac Studio looks like the best AI development workstation you can get.

[1] I was really partial to a character-level CNN model we had

[2] CEO presented next to Jensen Huang at a NVIDIA conference, his favorite word was "incredible". I thought it was "incredible" when I heard they got bought by Nike, but it was true.

link

PaulHoule 476 days ago

Well already it is faster than GigE...

https://arstechnica.com/gadgets/2013/10/os-x-10-9-brings-fas...

Thunderbolt is PCIe-based and I could imagine it being extended to do what https://en.wikipedia.org/wiki/Compute_Express_Link and https://en.wikipedia.org/wiki/InfiniBand

link