Hacker News new | ask | show | jobs
by emmericp 2877 days ago
Cool use of SR-IOV, I like it. We've done a few (academic) experiments with SR-IOV for flow bifurcation and we've wondered why no one seems to use it like this. The performance was quite good: neglible performance difference between PF and a single VF and only 5-10% when running multiple >= 8 VFs (probably cache contention somewhere in our specific setup).

You seem to be running this on X540 NICs, aren't you running into limitations for the VFs. Mostly the number of queues which I believe is limited to 2 per VF in the ixgbe family. I wonder whether the AF_XDP DPDK driver could be used instead if SR-IOV isn't available or feasible for some reason.

A more detailed look at performance would have been cool. I might try it myself if I find some time (or a student) :)

1 comments

We found that we could achieve 10G line rate with just the queues available to the VF, the NIC didn't seem to be a bottleneck providing DPDK was processing packets faster than line rate. It's worth noting that other traffic on the PF was/is minimal in our setup.

We tested this using DPDK pktgen on a identically-configured node (GLB Director and pktgen both using DPDK on a VF with flow bifurcation, on 2 separate machines on the same rack/switch), with GLB Director essentially acting as a reflector back to the pktgen node. pktgen was able to generate enough 40 byte TCP packets to saturate 10G with 2 TX cores/queues, and GLB Director was able to process those packets and encapsulate them with a sizeable set of binds/tables with 3 cores doing work (encapsulation) and 1 core doing RX/distribution/TX.

Yeah, 10G just isn't that much nowadays. And bigger NICs have more features in the VFs.

I've just built a quick test setup:

* two directly connected servers

* 6 core 2.4 GHz CPU

* XL710 40G NICs

* My packet generator MoonGen: https://github.com/emmericp/MoonGen with a quick & dirty modification to l3-tcp-syn-flood.lua to change dst mac

Got these results for 1-5 worker threads in Mpps: 3.84, 6.65, 10.17, 11.57, 11.3.

~10 Mpps is about 10G line rate for the encapsulated packets; this seems a little bit slower than I expected and it looks like I might be hitting the bottleneck of the distributor at 4 worker threads. Didn't look into anything in detail here (spent maybe 30 minutes for setup + tests), but we've done some VXLAN stuff in the past which I recall being faster.