|
|
|
|
|
by nickpsecurity
852 days ago
|
|
Distributed, shared memory machines used to do exactly that in HPC space. They were a NUMA alternative. It works if the processing plus high-speed interconnect are collectively faster than the request rate. The 8x setups with NVLink are kind of like that model. You may have meant that nobody has a stack that uses clustering or DSM with low-latency interconnects. If so, then that might be worth developing given prior results in other low-latency domains. |
|
reformed HPC person here.
Yes, but not latency optimised in the case here. HPC is normally designed for throughput. Accessing memory from outside your $locality is normally horrifically expensive, so only done when you can't avoid it.
For most serving cases, you'd be much happier having a bunch of servers with a number of groqs in them, than managing a massive HPC cluster and trying to keep it both up and secure. The connection access model is much more traditional.
Shared memory clusters are not really compatible with secure enduser access. It is possible to partition memory access, but its something thats not off the shelf (well that might have changed recently.) Also, shared memory means shared fuckups.
I do get what you're hinting at, but if you want to serve low latency, high compute "messages" then discrete "APU" cards are a really good way to do it simply (assuming you can afford it). HPCs are fun, but its not fun trying to keep them up with public traffic on them