Hacker News new | ask | show | jobs
by nemothekid 1135 days ago
>Issue #1 is that in Kafka’s server.properties file has the line log.flush.interval.messages=1 which forces Kafka to fsync on each message batch. So all tests, even those where this is not configured in the workload file will get this fsync behavior. I have previously blogged about how Kafka uses recovery instead of fsync for safety.

Respect to the Kafka team as Kafka is an incredible piece of software, but the Mongo guys got torched for eternity for pulling the same shenanigans.

2 comments

Kafka, unlike Mongo DB, relies on recovery/replication instead of fsync:

https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-...

Kafka has never tried to hide that fact and it does not, in any way, make Kafka unsafe.

I don't think Kafka using eschewing fsyncs is a bad thing; I'm aware of the risks. What I'm pointing out, and what got Mongo killed in the court of public opinion, was saying "our database is blazing fast because we turned off fsyncs".

Benchmarking a system that fsyncs every write to one that doesn't isn't an apples-to-apples comparison. You are free to make the argument that you might not need them, but if you are benchmarking systems and one of them fsyncs by default, that is the level of durability I'm going to expect, otherwise I can assume the other guy will be just as fast if he turns off fsyncs as well.

Is durability preserved when you lose replica connectivity around the same time as power to your CPU? As tends to happen.
Exactly I will never ever try MongoDB because of that. A database that do not fsync should not be called a database.
MongoDB moved on from mmap at version ~3.6. WiredTiger can be configured to fsync every commit. Enjoy trying MongoDB!

PS: I really miss working with mongodb. It's been almost 7 years since I last used it. I'm surprised I don't see it mentioned very often anymore.

Last I heard of MongoDB it was getting utterly buried by the Jepsen guy, and for anyone that follows distributed systems at some technical level, that is damning. He finds stuff wrong with everything, but that one was particularly damning.

MongoDB has always seemed to place write consistency secondary to other priorities (mostly sales / read / features) which is frankly a crap way to do a database, much less a distributed one. And I am so sick of MongoDB basically saying "no it's fixed in the new version" which is always a major red flag.

Right now it's getting its lunch eaten by Postgres's document interface from what I can tell.

a) Every distributed database has had serious issues with Jepsen.

b) MongoDB has been growing revenue ~40% year on year for the last few years.

c) PostgreSQL is only a serious competitor for MongoDB if you have small datasets. After all these years PostgresSQL still is ridiculously poor when it comes to clustering, replication etc. Everyone's solution of "just buy a bigger instance" is just laughable.

Growing revenue of a owner company as a argument for database? We have an Oracle fan here.
Jepsen does find stuff with everything. Thus you have to know what is being discussed is serious and blatantly bad, or just the usual "wow distributed is hard".

Which is why his papers are so great.

But the MongoDB one was "wow this is bad".

Every distributed database has been "wow this is bad".

I assume you have an example of one that wasn't ?

Kafka doesn't do any stupid tricks, but uses the underlying platform for the full potential: https://kafka.apache.org/documentation/#linuxflush

With the usual recommended settings, XFS filesystem, 3 replicas, 2 "in-sync" replicas, etc., it is rather safe. You can also tune background flush to your liking.

The above tradeoffs are very reasonable and Kafka runs very fast on slow disk s(magnetic or in cloud), and even faster on SSD/NVMe disks.

Kafka is not a database....
Maybe you could say that if it acted like redis pub/sub and nothing was stored.
MongoDB has been doing fsync by default for over a decade now .

And those that actually had tried it were aware that every client enabled fsync out of the box. So in fact the entire situation was seriously overblown.

But sure let irrational ideology affect your technology decisions. That will work out well.

Avoiding a database that has a proven historical record of disregarding data consistency and resorting to marketing gimmicks is "irrational ideology"?

Not everyone has time to review every single line of code in their tech stacks. Past reputation is important, and your replies here don't seem to be of much help to MongoDB's reputation as far as I can tell.