Hacker News new | ask | show | jobs
by thefounder 87 days ago
Why don’t they make an GPU optimised for inference/batch jobs with 1 TB of ram ? Everyone wants to run the biggest models locally.
1 comments

I'm not sure it's really possible.

Take a look at the die shot of a 5090:

http://dieshot.com/wp-content/uploads/2025/03/Dieshot-GB202-...

It has 32gb of RAM and memory controllers are about 10% of the the total area. What would you have to do for 1024gb of RAM?

Not to mention the price would be astronomical.

How is Apple packing 512gb of ram on their cpu?
IIRC Apple is using the lower channel width options in LPDDR5.

I.e. instead of 64 bit channels they do 16 (or maybe 32) bit. That lowers the die area needed on the chip for memory controllers.

But it also impacts bandwidth, AFAIK an M4 ultra is still on the order of 1/4 the bandwidth of something like a 5090