And you can go one step further using receive flow steering[1] or transmit flow steering. Most modern performance oriented network cards (Intel 10G, Solarflare 10G, anything from Mellanox, Chelsio, etc) surface each of these receive queues differently and can be seen as different on the right hand column in /proc/interrupts. You can distribute said rx/tx queues on different cores (ideally) on the same socket (but potentially a different core) as the application for minimum latency.
Linux has some really impressive knobs[2] for optimizing these sorts of weird workloads.
Linux has some really impressive knobs[2] for optimizing these sorts of weird workloads.
[1] https://lwn.net/Articles/382428/
[2] https://www.kernel.org/doc/Documentation/networking/scaling....