Hacker News new | ask | show | jobs
by rystsov 1517 days ago
Can you clarify what you mean? AFAIK with manual commit you have the most control over when the commit happens

Look at this blog post describing a data loss caused by auto-commit: https://newrelic.com/blog/best-practices/kafka-consumer-conf...

Also there also may be more subtle issues with auto-commit: https://github.com/edenhill/librdkafka/issues/2782

1 comments

I'm afraid the article is also wrong, this is a typical misconception when working with Kafka. Offsets are committed in the next poll() invocation. If the previous messages weren't processed, a rebalance occurs and messages are processed by other instance. This is an implementation detail of the Java client library but it allows the at-least-once semantic with auto-commit. The book Effective Kafka has a better explanation.

librdkafka isn't part of official Kafka so it may have problems with this as it has other limitations.

In any case, the report isn't right about this and it doesn't use the safest options. Commit offsets manually is the most flexible way but it isn't easy, being the error more usual to commit offsets individually

> Offsets are committed in the next poll() invocation.

I'm a little surprised by this--not that you're necessarily wrong, but our tests consumed messages synchronously, and IIRC (pardon, it's been 3 months since I was working on Redpanda full time and my time to go get a repro case is a bit limited) did see lost messages with the default autocommit behavior. At some point I'll have to go dig into this again.