Hacker News new | ask | show | jobs
by tgraf 2762 days ago
[Disclaimer: I'm one of the Cilium authors]

We have been trying to reproduce the performance results ever since the article was published as this is not in line at all with what we measure daily in our CI. We can easily do a multiple of these numbers.

There are some obvious flaws in the benchmarking scripts [0] such as using the "used" column of `free` without taking into account cached file buffers.

However, it does not explain why HTTP and FTP are worse compared to the TCP benchmark which is doing wire speed at ease. None of the Cilium datapath is HTTP or FTP specific unless HTTP specific security policies are in place in which case HTTP traffic is actually parsed.

We have requested more information on the scripts used by the author and continue to investigate. We will publish results as soon as we can reproduce this.

As stated by other commenters as well, most of these benchmarks are measuring the same Linux kernel code except for Weave (OVS) and Cilium (BPF). However, at the specified MTU of 9000, the bottleneck for all plugins will not be the forwarding datapath but the actual client and server code copying the data in and out of the kernel as there are very few packets actually being created and forwarded.

[0] https://gist.github.com/AlexisDucastel/ebb884831aeec5827e4df...

4 comments

> most of these benchmarks are measuring the same Linux kernel code

This, 1000x this. I'm afraid too many people treat their CNI plugin as 'magic' whilst many of them really aren't. 'Host' versus Calico is basically benchmarking the impact of a Linux bridge device, and maybe some more iptables rules than the host has (depending on whether the host benchmark has iptables enabled at all, whether there are K8s network security policies in place and enforced by Calico,...).

Also, configuration details are lacking. E.g. in the Calico benchmarks, was ipip enabled or not?

Yes most solutions are using Linux kernel, so what's being measured is indeed the impact of the way the kernel is being configured to achieve container networking. But that doesn't make those design choices, and the tests, meaningful. Calico, for example, contrary to your assumption, uses neither a Linux bridge device nor iptables for packet forwarding. (It does use iptables for policy enforcement, but that's not being tested here.)
I'm aware it doesn't use iptables, except to implement network policies, hence the reference. Good call about the bridge usage, my bad, makes sense that's not being used given Calico is L3...
"However, it does not explain why HTTP and FTP are worse compared to the TCP benchmark which is doing wire speed at ease"

Do you have some more detailed info on the configuration and commands you used? Nginx, for example, doesn't have sendfile() turned on by default (just one example of a configuration that might change benchmarks).

As somebody who has created a product in the past and also reviewed quite a few I've given up doing performance comparisons. This is quite sad as comparisons help people save time and money and cut through the marketing which technical people hate.

Every time I've done a performance comparison an expert pops up and says the result is invalid because of X. It takes 10 seconds to write the comment but perhaps a few hours to redo the tests and update the blog contents.

The blogger doesn't want an inaccurate blog and the software authors don't want bad benchmarks left up which constantly crop up in search results. As a blogger you feel a little duty bound to work on updating a blog you know probably won't be re-read by the majority of people who have already opened it anyway.

My conclusion is that fault should fall on the side of the software developer in most cases. Having created a startup I understand the time pressures and motivations driving the roadmap. There is a natural tendency to work on the differentiators and high value complex features. Blogs like this should act as a reminder that there is massive value in prioritising sane defaults, tests, documentation and building logic into the application that makes incorrect settings that effect performance unlikely.

From reading this blog I get the sense the author is quite technical. A positive public relations move would be to spend your time replicating the results and then when the problem is found make it difficult for the next person to have the same issue. Preferably with logic in the software, but worst case scenario with some bold text towards the top of the readme so it's not buried somewhere obscure.

> this is not in line at all with what we measure daily in our CI

But you setup your CI. This guy’s numbers are a lot closer to what I or another CNI n00b would get trying to set something up.

OTOH if you’re already a CNI expert, you wouldn’t be reading this article.

As someone wondering what CNI to choose I found this article helpful