Hacker News new | ask | show | jobs
by purpleblue 1135 days ago
TLDR: "I work at Confluent, the owners of Kafka, and I have determined through my tests that Redpanda's performance is greatly exaggerated."

I don't think we can get a less reliable or trustworthy set of performance tests than when someone's paycheck depends on the outcome of those tests. If Redpanda's performance were found to be better, would he really publish the test results?

7 comments

I mean, the other benchmarks we have are from RedPanda, so we're comparing one biased set of benchmarks to another biased set of benchmarks. Ultimately it's a matter of the reader understanding the methodology and drawing their own conclusions based on their own experience. I appreciate that the author explains the changes they've made, the impact of those changes, and why they think the changes are reasonable (ex: disabling fsync).

Personally I'm happy to see companies competing on performance like this. If one company puts out benchmarks I want to see their competition come in with their own benchmarks. Ideally we'll see improvements to both products, and a refined benchmarking suite and philosophy.

Disabling fsync is dubious.

I do find it interesting that Confluent feels the need to respond to RP given the disparities in size, install base, etc.

I've been watching Redpanda for a couple years primarily because I'm interested in their wasm data transformations. In the past 3 months I've heard it mentioned several dozen times by other teams in our company, vs. maybe 2-3 times in the >1y prior. So something seems in the air, and presumably Confluent has noticed.

I'm not sure why, Kafka per se doesn't seem to have really dropped any significant balls lately (and we're self-hosted so Confluent isn't very relevant).

Everyone is thinking about their cloud costs right now, so something that offers higher perf and lower ops is going to be more relevant today.
We’re about to released a revamped wasm and new sdk with prev lessons learned. Should be cool
Any sign of JSON schema in the registry? That would be great if so!
I actually enjoy these kinds of benchmarks. They're both incentivized to show their own platforms running in the most optimal setups and they're also incentivized to call out any BS from the other party. In the end users get to see the good and the bad of both platforms.

For this particular post I like that they explained each settings change they're making and why. In many of these benchmarks people will make some change and either not mention it or won't explain why they made the change and users are left trying to figure it out.

I don't think who does the benchmark, any benchmark, matters as long as they're open about how it was done, what properties were set, ideally why they were set, and what their results were. The big picture goal is to ostensibly be able to reproduce such benchmarks.

But I've found through industry that most benchmarks, especially for infrastructure software, are performed by the vendors. The burden for standing up the system(s) to pull off the benchmark is usually high enough that independents are rarely going to take up that banner and do it themselves.

Also, notably closed source systems, some vendors don't license their software to allow public benchmarks.

So, transparency is all we can really hope for.

I remember the halcyon days of the database wars with the vendors publishing new benchmarks seemingly ever month. Fun to watch "Lies, damn lies, and statistics" rear up on its hind legs and roar. And some of the monster clusters of hardware these folks put together were legion.

Similarly I enjoyed when Sun was publishing JEE benchmarks on cheap hardware running Glassfish against MySQL. At least they were publishing on these smaller systems more akin to what many companies may run internally in contrast to these million dollar cluster benchmarks BEA and Oracle were publishing.

Finally, just to throw this out, modern hardware is just extraordinary. Hard to appreciate how fast modern machines are if you didn't live with them in the old days.

Were in the glory days where we, most of we, simply don't care. Off the shelf hardware running untuned servers with reasonable algorithms have so much bandwidth and capability, just gets harder and harder to saturate today.

> Off the shelf hardware running untuned servers with reasonable algorithms have so much bandwidth and capability, just gets harder and harder to saturate today.

Interestingly that's not necessarily the case in the public cloud. I'm messing around with AWS storage for an upcoming talk. You definitely can saturate storage on AWS, and it's sometimes hard to tell why.

Author here. Anyone can run these tests. It's available for anyone to run and check my results.
Confluent doesn't own Kafka. Apache Kafka is an Apache project, with its own government structure. Some of the project management committee is employed by Confluent, but not all: e.g., the current PMC chair is Mickael Maison, employed by Red Hat. See https://projects.apache.org/committee.html?kafka
Kafka PMC is utterly dominated by Confluent or former employees. Everything Kafka does has been and always will be with Confluent's best interest first and foremost. The idea that Kafka isn't completely controlled by Confluent would be disingenuous at best. I don't have anything against Kafka or Confluent, but people should call a spade a spade here when it's blatantly obvious.
This is a very dim view, which doesn't match what I personally have seen, but to each his own.
Confluent don't own Kafka :)
Apache Software Foundation owns Kafka.
Meh. It's obvious Confluent exploited the status of being an Apache open source project in order to say they were open-source. But look at the make up of the PMC of Kafka and it's completely dominated by Confluent employees or former employees. Nothing gets done without Confluent's approval or best interest at heart.
Well, they did write most of it, and the PMC composition is changing.

> Nothing gets done without Confluent's approval or best interest at heart.

I disagree. This explicitly competes against the tiered storage in Confluent's enterprisey Kafka flavour: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A...

True. I wasn't trying to suggest there wasn't a bias here or minimize Confluent's involvement in the project.