| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AnthonyMouse 24 days ago

> For high end AI inference chips, DRAM already goes onto the interposer right next to the GPU to bring the bandwidth as high as possible.

The high end AI inference chips use HBM and cost tens of thousands of dollars. HBM uses 1024 data pins instead of 64, which is crazy expensive, which means that to the extent that consumer devices get it at all, it would be in addition to rather than instead of ordinary DRAM, e.g. you might have 12GB of HBM on the CPU package but then 64GB of less expensive DRAM. Increasing the number of cache hierarchy levels is a long-term trend. HBM as L4 cache is pretty plausible for high end CPUs as a supplement rather than replacement for DRAM.

There are already servers that work like this, e.g. Xeon Max has 64GB of HBM but then further supports up to 4TB of DDR5.

Moreover, the AI inference hardware integrates the CPU into the GPU because it's really just a giant GPU. They're not getting some major advantage from that, they just know nobody is going to want to swap out the CPU on a system where the CPU is mostly irrelevant. If you wanted that level of inference performance on a normal PC which is used for other purposes where the CPU actually matters then you would drop the AI accelerator with the HBM or GDDR into a PCIe slot.

1 comments

LarsDu88 24 days ago

I think the long term trend is typically the high end technology of today will be the mid to low tier technology of the future.

If putting 1024 data pins all connected via a nanoscale manufactured silicon interposer right now seems complicated and expensive, that doesn't mean we won't see it in tomorrow's consumer devices. If anything we will be MORE likely to see this one day. Apple and other companies are gradually working towards moving AI models to be more local which means memory bandwidth has a real killer app use case right now. Witness Liquid AI and their partnership with Mercedes Benz to put 8B param LLM models into vehicles.

Both Desktop PCs and the CPU are becoming less and less relevant as we move further in the decade to be honest...

AnthonyMouse 23 days ago

> I think the long term trend is typically the high end technology of today will be the mid to low tier technology of the future.

The trend doesn't look like that. The PCI bus from 1992 had 124 pins. PCIe 5.0 x16 has 164 pins; x8 has even fewer pins than the slots from decades ago. Guess how many pins Thunderbolt has. DDR1 DIMMs from the year 2000 had 184 pins; DDR5 has 288. The number of pins goes up very slowly if at all, because it's one of the most expensive ways to increase performance, despite being effective.

Which is why the enterprise hardware has always done it and the consumer hardware hasn't.

> Apple and other companies are gradually working towards moving AI models to be more local which means memory bandwidth has a real killer app use case right now.

The real problem is that ordinary consumers don't want to pay for 128GB of GDDR or HBM, and if they did then you would attach it to the GPU rather than the CPU anyway.

What they might want is the less expensive ordinary DRAM with a wider bus, which is what Apple does, but then you're not using 1024 pins and have no need to solder it instead of using CAMM.

> Witness Liquid AI and their partnership with Mercedes Benz to put 8B param LLM models into vehicles.

8B param models don't need exotic hardware, those run on existing consumer GPUs.

> Both Desktop PCs and the CPU are becoming less and less relevant as we move further in the decade to be honest...

Less relevant to what? Making up for the inefficiency of bad JavaScript with fast hardware? Running the less parallelizable parts of PC games? Databases and other branchy server workloads? They're as relevant as ever to the things they've always been relevant to.