Hacker News new | ask | show | jobs
by zozbot234 94 days ago
You need CXL to extend the cache coherency properties of actual RAM over a remote link. That's costly tech. Otherwise, you're relying on the OS (and even the compiler/basic libraries, since you need to make fences, etc. OS-visible) to paper over the differences by doing its own implementation of distributed shared memory (this is known as a 'SSI' or single-system-image approach) which has significant challenges and is closer to the spirit of setting up swap.
1 comments

I didn't mean anything like that. Just the equivalent of a GPU with the ability to run arbitrary CPU oriented programs.

Of course GPUs do many tasks very well but there are also plenty of problems that aren't well suited to them. Well I suppose I've answered my own question at this point. There probably just aren't enough real world problems that aren't amenable to running on a GPU while also being either compute or memory bandwidth bound.

Still the near-monoculture does strike me as odd. I guess GPUs have bifurcated into enterprise versus consumer at this point but otherwise all we've got is a single CPU example from over a decade ago and a single alternative take on the concept from Fujitsu. Is it just due to the obscene cost of masks for modern process nodes?

Things like that existed in the category of accelerator cards. Xeon Phi (Knights) is one example, focused on core count. Some from HP have soldered on SSDs too. You also had blade servers which is more focused on that use case, though that's going out of style.

I don't think PCIe is really a good fit for general CPU tasks. You need big heatsinks and power and can't fit that much RAM on board.