|
|
|
|
|
by bigwheels
242 days ago
|
|
How does LPDDR5 (This Xe3P) compare with GDDR7 (Nvidia's flagships) when it comes to inference performance? Local inference is an interesting proposition because today in real life, the NV H300 and AMD MI-300 clusters are operated by OpenAI and Anthropic in batching mode, which slows users down as they're forced to wait for enough similar sized queries to arrive. For local inference, no waiting is required - so you could get potentially higher throughput. |
|
Or, to be more specific, what is the speed when your GPU is out of RAM and it's reading from main memory over the PCI-E bus?
PCI-E 5.0: 64GB/s @ 16x or 32GB/s @ 8x 2x 48GB (96GB) of DDR5 in an AM5 rig: ~50GB/s
Versus the ~300GB/s+ possible with a card like this, it's a lot faster for large 'dense' models. Yes, even an NVIDIA 3090 is ~900GB/s of bandwidth, but it's only 24GB, so even a card like this Xe3P is likely to 'win' because of the higher memory available.
Even if it's 1/3rd of the speed of an old NVIDIA card, it's still 6x+ the speed of what you can get in a desktop today.