| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by antonmry 1512 days ago
	This report seems to have some wrong insights. Auto-commit offsets doesn't imply dataloss if records are processed synchronously. This is the safest way to test Kafka instead of commit offsets manually

2 comments

rystsov 1512 days ago

Can you clarify what you mean? AFAIK with manual commit you have the most control over when the commit happens

Look at this blog post describing a data loss caused by auto-commit: https://newrelic.com/blog/best-practices/kafka-consumer-conf...

Also there also may be more subtle issues with auto-commit: https://github.com/edenhill/librdkafka/issues/2782

link

antonmry 1512 days ago

I'm afraid the article is also wrong, this is a typical misconception when working with Kafka. Offsets are committed in the next poll() invocation. If the previous messages weren't processed, a rebalance occurs and messages are processed by other instance. This is an implementation detail of the Java client library but it allows the at-least-once semantic with auto-commit. The book Effective Kafka has a better explanation.

librdkafka isn't part of official Kafka so it may have problems with this as it has other limitations.

In any case, the report isn't right about this and it doesn't use the safest options. Commit offsets manually is the most flexible way but it isn't easy, being the error more usual to commit offsets individually

link

aphyr 1510 days ago

> Offsets are committed in the next poll() invocation.

I'm a little surprised by this--not that you're necessarily wrong, but our tests consumed messages synchronously, and IIRC (pardon, it's been 3 months since I was working on Redpanda full time and my time to go get a repro case is a bit limited) did see lost messages with the default autocommit behavior. At some point I'll have to go dig into this again.

link

relay23 1510 days ago

Almost nobody auto commits offsets in real applications though. If you do then you should really stop :)

link

agallego 1507 days ago

I’m not sure this is true. I’ve probably spoken with 500+ teams myself and by and large folks use default settings.

link