Hacker News new | ask | show | jobs
by derefr 312 days ago
But all of the most-ridiculous hyperscale deployments, where bandwidth + latency most matter, have multiple GPUs per CPU, with the CPU responsible for splitting/packing/scheduling models and inference workloads across its own direct-attached GPUs, providing the network the abstraction of a single GPU with more (NUMA) VRAM than is possible for any single physical GPU to have.

How do you do that, if each GPU expects to be its own backplane? One CPU daughterboard per GPU, and then the CPU daughterboards get SLIed together into one big CPU using NVLink? :P

1 comments

GPU as motherboard really only makes sense for gaming PCs. Even there SXM might be easier.