Hacker News new | ask | show | jobs
by pclmulqdq 1441 days ago
This was a fascinating read and the kernel does quite nicely in comparison - 66% of DPDK performance is amazing. That said, the article completely nails the performance advantage: DPDK doesn't do a lot of stuff that the kernel does. That stuff takes time. If I recall correctly, DPDK abstractions themselves cost a bit of NIC performance, so it might be interesting to see a comparison including a raw NIC-specific kernel bypass framework (like the SolarFlare one).
3 comments

> If I recall correctly, DPDK abstractions themselves cost a bit of NIC performance, so it might be interesting to see a comparison including a raw NIC-specific kernel bypass framework (like the SolarFlare one).

DPDK performs fairly well, even better for the most part. For some years I maintained a modification of the ixgbe kernel driver for Intel NICs that allowed us to perform high-performance traffic capture. We finally moved to DPDK once it was stable enough and we had the need to support more NICs, and in our comparisons we didn't see a performance hit.

Maybe manufacturer-made drivers can be better than DPDK, but if I had to guess that would be not because of the abstractions but because of the knowledge of the NIC architecture and parameters. I remember when we tried to do a PoC of a Mellanox driver modification and a lot of the work to get high performance was understanding the NIC options and tweaking them to get the most for our use case.

Or better yet, Mellanox VMA since it's open source (unlike Solarflare OpenOnload) and the NICs are far less expensive.
That's not true. OpenOnload is open source:

https://github.com/majek/openonload

It as been OSS since forever, but I thought that there were some patent gotchas. In any case it now apparently even support non-SolarFlare NICs!

Onload has also has other nice features like accelerating machine-local stuff (pipes, unix sockets and other stuff).

I have found OpenOnload to be easier to use than VMA, although I think you can go a bit faster with Mellanox NICs.
Is there a good comparison of these technologies? I've used dpdk for high rate streaming data and it roughly doubled my throughput over 10GE. I hear people using things like dma over Ethernet, and it sounds like there are several competing technologies. My use case is to get something from phy layer into gpu memory as fast as possible, latency is less important than throughput.
What you're looking for is RDMA. It was mostly restricted to Infiniband (IB) back in the days, but nowadays you probably want RoCEv2. You can look at iWARP too but I think nowadays RoCE won.

In any case, the standard software API for RDMA is ibverbs. All adapters supporting RDMA (be it IB, RoCE or iWARP) will expose it. You can get cloud instances with RDMA on AWS and Azure.

dpdk has rdma/GPUdirect now as well