| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bigwheels 290 days ago
	How does LPDDR5 (This Xe3P) compare with GDDR7 (Nvidia's flagships) when it comes to inference performance? Local inference is an interesting proposition because today in real life, the NV H300 and AMD MI-300 clusters are operated by OpenAI and Anthropic in batching mode, which slows users down as they're forced to wait for enough similar sized queries to arrive. For local inference, no waiting is required - so you could get potentially higher throughput.

3 comments

freeqaz 289 days ago

I think the better comparison, for consumers, is how fast is LPDDR5 compared to the normal DDR5 attached to your CPU?

Or, to be more specific, what is the speed when your GPU is out of RAM and it's reading from main memory over the PCI-E bus?

PCI-E 5.0: 64GB/s @ 16x or 32GB/s @ 8x 2x 48GB (96GB) of DDR5 in an AM5 rig: ~50GB/s

Versus the ~300GB/s+ possible with a card like this, it's a lot faster for large 'dense' models. Yes, even an NVIDIA 3090 is ~900GB/s of bandwidth, but it's only 24GB, so even a card like this Xe3P is likely to 'win' because of the higher memory available.

Even if it's 1/3rd of the speed of an old NVIDIA card, it's still 6x+ the speed of what you can get in a desktop today.

link

MrBuddyCasino 289 days ago

This doesn’t matter at all, if the resulting tokens/sec is still too slow for interactive use.

link

halJordan 290 days ago

Lpddr5x (not lpddr5) is 10.7 Gbps. Gddr7 is 32 Gbps. So it's going to be slower

link

codedokode 289 days ago

Yes but in matrix multiplication there are O(N²) numbers and O(N³) multiplications, so it might be possible that you are bounded by compute speed.

link

electroglyph 289 days ago

both are equally important. compute for prefill and mem bandwidth for generation

link

qingcharles 290 days ago

I asked GPT to pull real stats on both. Looks like the 50-series RAM is about 3X that of the Xe3P, but it wanted to remind me that this new Intel card is designed for data centers and is much lower power, and that the comparable Nvidia server cards (e.g. H200) have even better RAM than GDDR7, so the difference would be even higher for cloud compute.

link