Hacker News new | ask | show | jobs
by antonmry 1512 days ago
This report seems to have some wrong insights. Auto-commit offsets doesn't imply dataloss if records are processed synchronously. This is the safest way to test Kafka instead of commit offsets manually
2 comments

Can you clarify what you mean? AFAIK with manual commit you have the most control over when the commit happens

Look at this blog post describing a data loss caused by auto-commit: https://newrelic.com/blog/best-practices/kafka-consumer-conf...

Also there also may be more subtle issues with auto-commit: https://github.com/edenhill/librdkafka/issues/2782

I'm afraid the article is also wrong, this is a typical misconception when working with Kafka. Offsets are committed in the next poll() invocation. If the previous messages weren't processed, a rebalance occurs and messages are processed by other instance. This is an implementation detail of the Java client library but it allows the at-least-once semantic with auto-commit. The book Effective Kafka has a better explanation.

librdkafka isn't part of official Kafka so it may have problems with this as it has other limitations.

In any case, the report isn't right about this and it doesn't use the safest options. Commit offsets manually is the most flexible way but it isn't easy, being the error more usual to commit offsets individually

> Offsets are committed in the next poll() invocation.

I'm a little surprised by this--not that you're necessarily wrong, but our tests consumed messages synchronously, and IIRC (pardon, it's been 3 months since I was working on Redpanda full time and my time to go get a repro case is a bit limited) did see lost messages with the default autocommit behavior. At some point I'll have to go dig into this again.

Almost nobody auto commits offsets in real applications though. If you do then you should really stop :)
I’m not sure this is true. I’ve probably spoken with 500+ teams myself and by and large folks use default settings.