Hacker News new | ask | show | jobs
by ceencee 1135 days ago
Who is running single az deployments who also cares about data loss and availability? Seriously? I’ve personally supported 1000s of kafka deploys and this isn’t a thing in the cloud at least. There is no call for wanting fsync per message, it is an anti pattern and isn’t done because it isn’t necessary. Data loss in kafka isn't a real problem that hurts real world users at all.
3 comments

I was grabbing beer with a buddy who has ran some large - petabytes per month - Kafka deployments, and his experience was very much that Kafka will lose acked writes if not very carefully configured. He had direct experiences with data loss from JVM GC creating terrible flaky cluster conditions and, separately, from running out of disk on one cluster machine
> There is no call for wanting fsync per message, it is an anti pattern and isn’t done because it isn’t necessary

1. Don't have to do it by message

2. It's used by many distributed db engines, kafka and (i think) zk are the outliers here, not the other way around

Kafka is not a "db engine". zk is a "db engine" in the same way 'DNS' is a "db engine".
Oh, DNS is definitely a database engine [1] ;)

[1]: https://dyna53.io

Ah yes, the semantic argument. Fyi - pulsar and etcd do use fsync
No one is arguing with you. You were making an argument based on a misinformed software category assertion and the error was pointed out. So r/fyi/til maybe?
I can't list names about the "unserious" people who aren't running multi-AZ, but this is the approach to durability that MongoDB took ~15 years ago and they have never lived it down.

It may just be that data reliability isn't a huge concern for messaging queues, so it's less of an issue, but pretending the risk isn't there doesn't help anyone.