| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lhl 476 days ago
	Since no one specifically answered your question yet, yes, you should be able to get usable performance. A Q4_K_M GGUF of DeepSeek-R1 is 404GB. This is a 671B MoE that "only" has 37B activations per pass. You'd probably expect in the ballpark of 20-30 tok/s (depends on how much actually MBW can be utilized) for text generation. From my napkin math, the M3 Ultra TFLOPs is still relatively low (around 43 FP16 TFLOPs?), but it should be more than enough to handle bs=1 token generation (should be way <10 FLOPs/byte for inference). Now as far is its prefill/prompt processing speed... well, that's another matter.

2 comments

lynguist 475 days ago

I actually think it’s not a coincidence and they specifically built this M3 Ultra for DeepSeek R1 4-bit. They also highlight in their press release that they tested it with 600B class LLMs (DeepSeek R1 without referring to it by name). And they specifically did not stop at 256 GB RAM to make this happen. Maybe I’m reading too much into it.

tgma 475 days ago

Pretty sure this has absolutely nothing to do with Deepseek and even local LLM at large, which has been a thing for a while and an obvious use case original Llama leak and llama.cpp coming around.

Fact is Mac Pros in the Intel days supported 1.5TB RAM in some configurations[1] and that was 6 years ago expectations of their high end customer base. They needed to address the gap for those customers so they would have shipped such a product regardless. Local LLM is cherry-on-top. Deepseek in particular almost certainly had nothing to do with it. They will still need to double their supported RAM in their SoC to get there. Perhaps in a Mac Pro or a different quad-Max-glued chip.

[1]: https://support.apple.com/en-us/101639

saagarjha 475 days ago

The thing that people are excited about here is unified memory that the GPU can address. Mac Pro had discrete GPUs with their own memory.

tgma 475 days ago

I understand why they are excited about it—just pointing out it is a happy coincidence. They would have and should have made such a product to address the need of RAM users alone, not VRAM in particular, before they have a credible case to cut macOS releases on Intel.

water9 475 days ago

Intel integrated graphics, technically also used unified memory with the standard dram

kergonath 475 days ago

Those also have terrible performance and worse bandwidth. I am not sure they are really relevant, to be honest.

McDaveNZ 475 days ago

Did the Xeons in the Mac Pro even have integrated graphics?

icedchai 474 days ago

So did the Amiga, almost 40 years ago...

vaxman 474 days ago

You mean this? ;) http://de.wikipedia.org/wiki/Datei:Amiga_1000_PAL.jpg

RIP Jay Miner who watched his unified memory daughters Agnus, Denise and Paula be slowly murdered by Jack Tramiel's vengeance against Irving Gould. [Why couldn't the shareholders have stormed their boardroom 180 days before the company ran out of cash, installed interim management who, in turn, would have brought back the megalomaniac Founder that would, until his dying breath, keep spreading their cash to the super brilliant geniuses that made all the magic chips happen and then turn the resulting empire over to ops people to make their workplace so uncomfortable they all retire early and live happily ever after on tropical islands and snowy mountain tops?]

kmacdough 475 days ago

That or it's the luckiest coincidence! In all seriousness, Apple is fairly consistent about not pushing specs that don't matter and >256GB is just unnecessary for most other common workloads. Factors like memory bandwidth, core count and consumption/heat would have higher impact.

That said, I doubt it was explicitly for R1, but rather based the industry a few years ago when GPT 3s 170B was SOTA, but the industry was still looking larger. "As much memory as possible" is the name of the game for AI in a way that's not true for other workloads. It may not be true for AI forever either.

icedchai 474 days ago

The high end Intel Macs supported over a TB of RAM, over 5 years ago. It's kinda crazy Apple's own high end chips didn't support more RAM. Also, the LLM use case isn't new... Though DeepSeek itself may be. RAM requirements always go up.

teknologist 473 days ago

Just to clarify. There is an important difference between unified memory, meaning accessible by both CPU and GPU, and regular RAM that is only accessible by CPU.

angoragoats 471 days ago

As mentioned elsewhere in this thread, unified memory has existed long before Apple released the M1 CPU, and in fact many Intel processors that Apple used before supported it (though the Mac pros that supported 1.5TB of RAM did not, as they did not have integrated graphics).

The presence of unified memory does not necessarily make a system better. It’s a trade off: the M-series systems have high memory bandwidth thanks to the large number of memory channels, and the integrated GPUs are faster than most others. But you can’t swap in a faster GPU, and when using large LLMs even a Mac Studio is quite slow compared to using discrete GPUs.

brookst 475 days ago

Design work on the Ultra would have started 2-3 years ago, and specs for memory at least 18 months ago. I’m not sure they had that kind of inside knowledge for what Deepseek specifically was doing that far in advance. Did Deepseek even know that long ago?

happyopossum 475 days ago

> they specifically built this M3 Ultra for DeepSeek R1 4-bit

Which came out in what, mid January? Yeah, there's no chance Apple (or anyone) has built a new chip in the last 45 days.

tempaccount420 471 days ago

Don't they build these Macs just-in-time? The bandwidth doesn't change with the RAM, so surely it couldn't have been that hard to just... use higher capacity RAM modules?

vaxman 474 days ago

"No chance?" But it has been reported that the next generation of Apple Silicon started production a few weeks ago. Those deliveries may enable Apple to release its remaining M3 Ultra SKUs for sale to the public (because it has something Better for its internal PCC build-out).

It also may point to other devices ᯅ depending upon such new Apple Silicon arriving sooner, rather than later. (Hey, I should start a YouTube channel or religion or something. /s)

SV_BubbleTime 475 days ago

No one is saying they built a new chip.

But the decision to come to market with a 512GB sku may have changed from not making sense to “people will buy this”.

cyanydeez 474 days ago

Dies are designed in years.

This was just a coincidence.

SV_BubbleTime 474 days ago

What part of “no one is saying they designed a new chip” is lost here?

cyanydeez 474 days ago

Sorry, non of us a fan boys trying to shape apple is great narratives

forrestthewoods 475 days ago

I don’t think you understand hardware timelines if you think this product had literally anything to do with anything DeepSeek.

reitzensteinm 475 days ago

Chip? Yes. Product? Not necessarily...

It's not completely out of the question that the 512gb version of M3 Ultra was built for their internal Apple silicon servers powering Private Compute Cloud, but not intended for consumer release, until a compelling use case suddenly arrived.

I don't _think_ this is what happened, but I wouldn't go as far as to call it impossible.

forrestthewoods 475 days ago

DeepSeek R1 came out Jan 20.

Literally impossible.

reitzensteinm 475 days ago

The scenario is that the 512gb M3 Ultra was validated for the Mac Studio, and in volume production for their servers, but a business decision was made to not offer more than a 256gb SKU for Mac Studio.

I don't think this happened, but it's absolutely not "literally impossible". Engineering takes time, artificial segmentation can be changed much more quickly.

forrestthewoods 475 days ago

From “internal only” to “delivered to customers” in 6 weeks is literally impossible.

jahewson 475 days ago

That's absurd. Fabing custom silicon is not something anybody does for a few thousand internal servers. The unit economics simply don't work. Plus Apple is using OpenAI to provide its larger models anyway, so the need never even existed.

brookst 475 days ago

Apple is positively building custom servers, and quantities are closer to the 100k range than 1000 [0]

But I agree they are not using m3 ultra for that. It wouldn’t make any sense.

0. https://www.theregister.com/AMP/2024/06/11/apple_built_ai_cl...

teknologist 473 days ago

That could be why they're also selling it as the Mac Studio M3 Ultra

bustling-noose 475 days ago

My thoughts too. This product was in the pipeline maybe 2-3 years ago. Maybe with LLMs getting popular a year ago they tried to fit more memory but it’s almost impossible to do that that close to a launch. Especially when memory is fused not just a module you can swap.

tgma 475 days ago

Your conclusion is correct but to be clear the memory is not "fused." It's soldered close to the main processor. Not even a Package-on-Package (two story) configuration.

See photo without heatspreader here: https://wccftech.com/apple-m2-ultra-soc-delidded-package-siz...

bustling-noose 473 days ago

I think by fuse I mean't its stuck on to the SOC module, not part of the SOC as I may have worded. While you could maybe still add NANDs later in the manufacturing process, it's probably not easy, especially if you need more NANDs and a larger module which might cause more design problems. The NAND is closer cause the controller is in the SOC. So the memory controller probably would also change with higher memory sizes which would mean this cannot be a last minute change.

fennecfoxy 471 days ago

Sheesh, the...comments on that link.

nightski 475 days ago

$10k to run a 4 bit quantized model. Ouch.

OriginalMrPink 475 days ago

That's today. What about tomorrow?

water9 475 days ago

The M4 MacBook Pro 128GB can run a 32B perimeter model with an 8 bit quantized model just fine

jrflowers 474 days ago

> they specifically built this M3 Ultra for DeepSeek R1 4-bit.

This makes sense. They started gluing M* chips together to make Mac Studios three years ago, which must have been in anticipation of DeepSeek R1 4-bit

a1o 475 days ago

Any ideas on power consumption? I wonder how much power would that use. It looks like it would be more efficient than everything else that currently exists.

j45 475 days ago

Looks like up to 480W listed here

https://www.apple.com/mac-studio/specs/

a1o 475 days ago

Thanks!!

ryao 475 days ago

The M2 Ultra Mac Pro could reach a maximum of 330W according to Apple:

https://support.apple.com/en-us/102839

I assume it is similar.

drited 476 days ago

I would be curious about context window size that would be expected when generating ballpark 20 to 20 tokens per second using Deepseek-R1 Q4 on this hardware?