| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dagmx 52 days ago

This feels fluff to me on the part of the author (whose work I don’t want to trivialize) but I don’t think they’ve actually looked deeper than a paper spec sheet on this.

1. Yes it has the same number of cores as a 5070 mobile. It’s also running at a shared peak of 2/3 the bandwidth and a shared peak of 2/3 the TDP. The GPU by itself will likely perform at half the dedicated units performance

2. Apple may not have SVE2 but they do have the AMX (private) and SME. I don’t see why he thinks the SVE2 will give him more performance than the SME.

3. He mentions a single core type but doesn’t mention the total makeup. We already have known for a year how the DGX Spark compares to Apple chips. For CPU it’s roughly equivalent to an M3 Pro and for GPU compute (not rasterization) it’s between an M4 Pro and M4 Max without considering bandwidth.

The real advantage to these is that they run CUDA. That’s it. Otherwise when they launch they’ll be 2-3 generations behind where Apple is and 1 gen behind AMD.

The other super power of the DGX Spark was the NIC for pairing them together. But that’s been removed here too.

4 comments

storus 52 days ago

> GPU compute (not rasterization) it’s between an M4 Pro and M4 Max without considering bandwidth

You are likely thinking about token generation which is dependent on memory bandwidth where Apple has an edge. Spark's GPU compute is way higher than even M5 Max (17 FP32 TFlops), around 2x FP32 TFlops... It's literally 6144 CUDA cores like desktop 5070, slowed down by slow memory and lower TDP (29.7 vs 31 FP32 TFlops on 5070).

link

dagmx 52 days ago

That’s only if you consider FP32 specifically. On average the M5 Max will pull ahead for tasks like GPU raytracing (it’s currently the fastest mobile GPU for Blender rendering) and token generation and other things that benefit from the higher memory bandwidth.

I’d also mention that you’re comparing peaks which the RTX Spark won’t be hitting. The top TDP is less than that of the DGX Spark.

I just think anyone calling this a beast and a game changer are conflating/extrapolating from different form factors and constraints

link

well_ackshually 52 days ago

> fastest mobile GPU for Blender rendering

cool story, but nobody cares about mobile GPUs for blender. A 4080 eats an M5 Max alive for breakfast. The 5080 in my machine that cost me 1500€ runs circles around an M5 Max that would cost me over 6000€. And when in 5 years the 5080 isn't enough, I can upgrade it to a 7080 or whatever, which will remain compatible.

If you're a professional, soldered products like the RTX Spark or Apple's offering are a dead end. They are literally never worth it.

link

dagmx 51 days ago

As a professional in the space, a ton of people DO care about mobile performance. If you’d go to SIGGRAPH in the last few years you’d see how the landscape has really changed.

It’s not going to be the primary place of creation but there’s a lot of usefulness in having a portable workstation or that entire segment of the laptop workspace wouldn’t exist.

In either case, it’s besides the point because the point is talking about the compute levels of a GPU in the same form factor.

link

Foobar8568 52 days ago

And nobody cares about 5080.

link

well_ackshually 52 days ago

Exactly. A 5080 is just an enthusiast gamer card. If you're part of a large company that requires you to run Blender/3DSMax/etc (read: disney/pixar sized), you're going to have an A6000 in there, or even just a render farm.

Game dev & asset work is probably happy with a 5080 and that's what most rendering/dev machines would have.

The addressable market of "i have 6000 to blow and i need meh performance on anything related to 3D rendering" is small, and benchmarks make it look bigger than it really is.

link

dagmx 51 days ago

Ironically the two studios you mentioned don’t actively render on GPUs and it’s an area which shows that even these small SoCs can punch way above their weight if you look at their pure compute power.

Disney’s Hyperion is CPU based and RenderMan XPU is just exiting beta after over a decade.

But while they do stack their workstations with higher end GPUs for artist throughput in viewports it’s mostly just for the higher memory to fit unoptimized scenes in. None of the studios or major films I’ve worked on have had their on desk artists be raster rate gated but just memory gated.

But again, besides the point, because it’s still valuable as a metric to compare with when comparing perf between similar chipsets.

There are already more creatives using their consumer grade hardware to make stuff. And even the studios you mentioned do actually use laptops on the go for parts of their creation pipelines for various things like virtual production scouting etc.

link

cthalupa 52 days ago

Prefill is another advantage vs. Apple. It's way way way way faster on a spark than it is even on an m5 max.

Same model, same quant, same query, as close to as matched settings as I can get from vllm, and for workloads with large prompts + low cacheability, one of my sparks will often be done responding before the mbp is done with prefill.

link

llm_nerd 52 days ago

It is absolutely fluff, and the only reason this worthless tweet is on the front page of HN is that this audience has a habit of canonizing certain people, and treating each of their bowel movements as prophetic.

Guy suddenly became aware of a chip that the rest of the industry long knew about, seems completely unaware of the competitors, and posts about how it's a BEAST and will be a GAME CHANGER.

Like the DGX Spark was a game changer? Eh, it has mostly been a massive disappointment. An overpriced nvidia laptop isn't going to change the equation an iota.

link

trympet 52 days ago

Yes. This reads like a LinkedIn post rehashing old news. I’m not even in the industry.

link

wmf 52 days ago

Lemire is very narrowly interested in CPU SIMD so within that niche it may be interesting. As you said, overall the Spark is good but not great.

link

oofbey 52 days ago

I cannot fathom why he brings up CPU SIMD as a potential comparative weakness on the NVIDIA Spark when it has teraflops of CUDA sitting right there.

link

well_ackshually 52 days ago

Because you won't run your sorting algorithm that runs every frame on a CUDA kernel. CPU performance matters more than however many tflops of CUDA you have under hand as soon as you do silly things like "run an OS" and "use your PC for anything but shitting out tokens"

link