| > Your threads are really spread out, so its bad for major workloads like Databases (which may communicate heavily, and also benefit from the "combined" L3 caches on the Intel Xeon Silver setup) As soon as you add the second node for a workload with poor locality like that, half your accesses are on the wrong node. No matter how many more nodes you add, the worst it can cost is that much again. The unified L3 is also probably immaterial when you have a large database like that which exceeds the L3 by hundreds of gigabytes and it's ~100% cache misses either way. On the other hand, you have 60% more cores to make up for it. I'm not saying there are no circumstances or workloads where the Intel system makes sense, but it's kind of telling that it's a matter of finding them as exceptions to the rule. For the general purpose things like running a bunch of unrelated VMs that aren't affected by number of nodes much if at all (or benefit from them because one misbehaving guest or process thrashing all the caches and flooding memory bandwidth only impacts a single CCX/node), it seems like an obvious choice. It's also going to be interesting to see how databases optimize for multiple NUMA nodes now that they're common. There should be ways to determine which parts of the database are on which node and then prefer to dispatch queries to cores on the same node, or keep copies of the highest hit rate data on multiple nodes etc. |