| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by comandillos 17 days ago
	So they have basically reused the same hardware as in the DGX Spark (GB10)... That chip isn't great for LLM inference actually. https://www.techpowerup.com/gpu-specs/gb10.c4342 https://www.nvidia.com/en-us/products/rtx-spark/

3 comments

ewklwekl 17 days ago

It is great for inference for single user/single session. it is not replacement for graphical accelerator, that run several concurrent inference sessions in parallel.

Basically the same tradeoff as macmini with unified memory.

link

general_reveal 17 days ago

The RTX GPU laptops run very hot. Even though they are pound for pound better, it’s just runs too hot for local llm usage for me at least. Prefer Macs for this. A lot of AMD cards also run cooler. I wonder if undervting would help with smaller models and heat.

link

comandillos 17 days ago

I mean the GB10 is pretty efficient for the power it has, but imho is nowhere near the power efficiency of Apple Silicon (it was never intended to be a chip used for mobile devices). I guess this is kind of the movement Apple did with the A12Z and the Mini but... the other way around?

I think its gonna be another failure as we are used to see with the PC market these days.

link

joe_mamba 17 days ago

>That chip isn't great for LLM inference actually.

Why do I have the feeling it's been intentionally made to be bad in order to get you on to their most pensive datacenter gear.

link

ekidd 17 days ago

It's probably more that LLM inference speed comes from having a large amount of fast RAM. And fast RAM is brutally expensive right now.

At this point, your cost-efficient options include used 3090s, "frankenrigs" using recycled data center cards, and a handful of "workstation" class cards, where the originally high margins and the long enterprise purchasing cycles have kept prices from going up too fast.

In contrast, a lot of these "personal" AI systems are basically a GPU-like core wired to larger amounts of slow RAM. Which is still semi-affordable. Generally speaking, they make for OK chatbots but extremely slow coding agents. Whereas you can run a modestly useful coding agent at reasonable speed on a 3090.

So yeah, a lot of these systems are bit scammy. But not because it's a secret conspiracy to protect data center cards. Rather, there simply isn't enough fast RAM in the entire world. So they'll flog you disappointly slow RAM instead.

TL;dr: Might be useful for some use cases, but benchmark very carefully.

link