| alex here, original author of redpanda is hard to respond to a 6-part blog series content - released all at once - on an HN thread. - what we can deterministically show is data loss on apache kafka with no fsync() [shouldn't be a surprise to anyone] - stay tuned for an update here. - the kafka partition model of one segment per partition could be optimized in both arch - the benefit for all of us, is that all of these things will be committed to the OMB (open messaging benchmark) and will be on git for anyone interested in running it themselves. - we welcome all confluent customers (since the post is from the field cto office) to benchmark against us and choose the best platform. this is how engineering is done. In fact, we will help you run it for you at no cost. Your hardware, your workload head-to-head. We'll help you set it up with both.... but let's keep the rest of the thread technical. - log.flush.interval.messages=1 - this is something we've taken a stance a long long time ago in 2019. As someone who has personally talked to hundreds of enterprises to date, most workloads in the world should err on the side of safety and flushing to disk (fsync()). Hardware is very good today and you no longer have to choose between safety and reasonable performance. This isn't the high latency you used to see on spinning disks. |
Kafka and fsyncs: https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-...