|
|
|
|
|
by zozbot234
94 days ago
|
|
You need CXL to extend the cache coherency properties of actual RAM over a remote link. That's costly tech. Otherwise, you're relying on the OS (and even the compiler/basic libraries, since you need to make fences, etc. OS-visible) to paper over the differences by doing its own implementation of distributed shared memory (this is known as a 'SSI' or single-system-image approach) which has significant challenges and is closer to the spirit of setting up swap. |
|
Of course GPUs do many tasks very well but there are also plenty of problems that aren't well suited to them. Well I suppose I've answered my own question at this point. There probably just aren't enough real world problems that aren't amenable to running on a GPU while also being either compute or memory bandwidth bound.
Still the near-monoculture does strike me as odd. I guess GPUs have bifurcated into enterprise versus consumer at this point but otherwise all we've got is a single CPU example from over a decade ago and a single alternative take on the concept from Fujitsu. Is it just due to the obscene cost of masks for modern process nodes?