Hacker News new | ask | show | jobs
by theojulienne 2871 days ago
We found that we could achieve 10G line rate with just the queues available to the VF, the NIC didn't seem to be a bottleneck providing DPDK was processing packets faster than line rate. It's worth noting that other traffic on the PF was/is minimal in our setup.

We tested this using DPDK pktgen on a identically-configured node (GLB Director and pktgen both using DPDK on a VF with flow bifurcation, on 2 separate machines on the same rack/switch), with GLB Director essentially acting as a reflector back to the pktgen node. pktgen was able to generate enough 40 byte TCP packets to saturate 10G with 2 TX cores/queues, and GLB Director was able to process those packets and encapsulate them with a sizeable set of binds/tables with 3 cores doing work (encapsulation) and 1 core doing RX/distribution/TX.

1 comments

Yeah, 10G just isn't that much nowadays. And bigger NICs have more features in the VFs.

I've just built a quick test setup:

* two directly connected servers

* 6 core 2.4 GHz CPU

* XL710 40G NICs

* My packet generator MoonGen: https://github.com/emmericp/MoonGen with a quick & dirty modification to l3-tcp-syn-flood.lua to change dst mac

Got these results for 1-5 worker threads in Mpps: 3.84, 6.65, 10.17, 11.57, 11.3.

~10 Mpps is about 10G line rate for the encapsulated packets; this seems a little bit slower than I expected and it looks like I might be hitting the bottleneck of the distributor at 4 worker threads. Didn't look into anything in detail here (spent maybe 30 minutes for setup + tests), but we've done some VXLAN stuff in the past which I recall being faster.