| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by purpleblue 1135 days ago
	TLDR: "I work at Confluent, the owners of Kafka, and I have determined through my tests that Redpanda's performance is greatly exaggerated." I don't think we can get a less reliable or trustworthy set of performance tests than when someone's paycheck depends on the outcome of those tests. If Redpanda's performance were found to be better, would he really publish the test results?

7 comments

insanitybit 1135 days ago

I mean, the other benchmarks we have are from RedPanda, so we're comparing one biased set of benchmarks to another biased set of benchmarks. Ultimately it's a matter of the reader understanding the methodology and drawing their own conclusions based on their own experience. I appreciate that the author explains the changes they've made, the impact of those changes, and why they think the changes are reasonable (ex: disabling fsync).

Personally I'm happy to see companies competing on performance like this. If one company puts out benchmarks I want to see their competition come in with their own benchmarks. Ideally we'll see improvements to both products, and a refined benchmarking suite and philosophy.

notfromhere 1135 days ago

Disabling fsync is dubious.

I do find it interesting that Confluent feels the need to respond to RP given the disparities in size, install base, etc.

morelisp 1135 days ago

I've been watching Redpanda for a couple years primarily because I'm interested in their wasm data transformations. In the past 3 months I've heard it mentioned several dozen times by other teams in our company, vs. maybe 2-3 times in the >1y prior. So something seems in the air, and presumably Confluent has noticed.

I'm not sure why, Kafka per se doesn't seem to have really dropped any significant balls lately (and we're self-hosted so Confluent isn't very relevant).

insanitybit 1135 days ago

Everyone is thinking about their cloud costs right now, so something that offers higher perf and lower ops is going to be more relevant today.

agallego 1135 days ago

We’re about to released a revamped wasm and new sdk with prev lessons learned. Should be cool

alexisread 1135 days ago

Any sign of JSON schema in the registry? That would be great if so!

minhazm 1135 days ago

I actually enjoy these kinds of benchmarks. They're both incentivized to show their own platforms running in the most optimal setups and they're also incentivized to call out any BS from the other party. In the end users get to see the good and the bad of both platforms.

For this particular post I like that they explained each settings change they're making and why. In many of these benchmarks people will make some change and either not mention it or won't explain why they made the change and users are left trying to figure it out.

whartung 1135 days ago

I don't think who does the benchmark, any benchmark, matters as long as they're open about how it was done, what properties were set, ideally why they were set, and what their results were. The big picture goal is to ostensibly be able to reproduce such benchmarks.

But I've found through industry that most benchmarks, especially for infrastructure software, are performed by the vendors. The burden for standing up the system(s) to pull off the benchmark is usually high enough that independents are rarely going to take up that banner and do it themselves.

Also, notably closed source systems, some vendors don't license their software to allow public benchmarks.

So, transparency is all we can really hope for.

I remember the halcyon days of the database wars with the vendors publishing new benchmarks seemingly ever month. Fun to watch "Lies, damn lies, and statistics" rear up on its hind legs and roar. And some of the monster clusters of hardware these folks put together were legion.

Similarly I enjoyed when Sun was publishing JEE benchmarks on cheap hardware running Glassfish against MySQL. At least they were publishing on these smaller systems more akin to what many companies may run internally in contrast to these million dollar cluster benchmarks BEA and Oracle were publishing.

Finally, just to throw this out, modern hardware is just extraordinary. Hard to appreciate how fast modern machines are if you didn't live with them in the old days.

Were in the glory days where we, most of we, simply don't care. Off the shelf hardware running untuned servers with reasonable algorithms have so much bandwidth and capability, just gets harder and harder to saturate today.

hodgesrm 1135 days ago

> Off the shelf hardware running untuned servers with reasonable algorithms have so much bandwidth and capability, just gets harder and harder to saturate today.

Interestingly that's not necessarily the case in the public cloud. I'm messing around with AWS storage for an upcoming talk. You definitely can saturate storage on AWS, and it's sometimes hard to tell why.

jvanlightly 1135 days ago

Author here. Anyone can run these tests. It's available for anyone to run and check my results.

snotrockets 1135 days ago

Confluent doesn't own Kafka. Apache Kafka is an Apache project, with its own government structure. Some of the project management committee is employed by Confluent, but not all: e.g., the current PMC chair is Mickael Maison, employed by Red Hat. See https://projects.apache.org/committee.html?kafka

purpleblue 1135 days ago

Kafka PMC is utterly dominated by Confluent or former employees. Everything Kafka does has been and always will be with Confluent's best interest first and foremost. The idea that Kafka isn't completely controlled by Confluent would be disingenuous at best. I don't have anything against Kafka or Confluent, but people should call a spade a spade here when it's blatantly obvious.

snotrockets 1135 days ago

This is a very dim view, which doesn't match what I personally have seen, but to each his own.

EdwardDiego 1135 days ago

Confluent don't own Kafka :)

uberduper 1135 days ago

Apache Software Foundation owns Kafka.

purpleblue 1135 days ago

Meh. It's obvious Confluent exploited the status of being an Apache open source project in order to say they were open-source. But look at the make up of the PMC of Kafka and it's completely dominated by Confluent employees or former employees. Nothing gets done without Confluent's approval or best interest at heart.

EdwardDiego 1135 days ago

Well, they did write most of it, and the PMC composition is changing.

> Nothing gets done without Confluent's approval or best interest at heart.

I disagree. This explicitly competes against the tiered storage in Confluent's enterprisey Kafka flavour: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A...

uberduper 1135 days ago

True. I wasn't trying to suggest there wasn't a bias here or minimize Confluent's involvement in the project.