Hacker News new | ask | show | jobs
by AnotherGoodName 143 days ago
And just to add to this the reason the Apple macs are used is that they have the highest memory bandwidth of any easily obtainable consumer device right now. (Yes the nvidia cards which also have hbm are even higher on memory bandwidth but not easily obtainable). Memory bandwidth is the limiting factor for inference more so than raw compute.

Memory costs are skyrocketing right now as everyone pivots to using hbm paired with moderate processing power. This is the perfect combination for inference. The current memory situation is obviously temporary. Factories will be built and scaled and memory is not particularly power hungry, there’s a reason you don’t really need much cooling for it. As training becomes less of a focus and inference more of a focus we will at some point be moving from the highest end nvidia cards to boxes of essentially power efficient memory hbm memory attached to smaller more efficient compute in the future.

I see a lot of commentary “ai companies are so stupid buying up all the memory” around the place atm. That memory is what’s needed to run the inference cheaply. It’s currently done on nvidia cards and apple m series cpus because those two are the first to utilise High Bandwidth Memory but the raw compute of the nvidia cards is really only useful for training, they are just using them for inference right now because there’s not much pn the market that has similar memory bandwidth. But this will be changing very soon. Everyone in the industry is coming along with their own dedicated compute using hbm memory.