Hacker News new | ask | show | jobs
by alexatkeplar 3870 days ago
Thanks for the detailed explanation jganetsk.

> when creating a consumer, the sync point of a new consumer really should start from the very beginning of the topic, at a predictable explicit start point, rather than at the current end of the topic

I'll talk about Kinesis because that's the technology we use more at Snowplow. When creating a Kinesis consumer, I can specify whether I want to start reading from a) TRIM_HORIZON (which is the earliest events in the stream which haven't yet been expired aka "trimmed"), b) LATEST which is the Cloud Pub/Sub capability, c) AT_SEQUENCE_NUMBER {x} which means from the event in the stream with the given offset ID or d) AFTER_SEQUENCE_NUMBER {x} which is the event immediately after c).

Kinesis streams or Kafka topics don't themselves care about the progress of any individual consumer - consumers are responsible for tracking their own position in the stream via sequence numbers / offset IDs.

> It doesn't necessarily make sense to retain all tweets forever by default (although there certainly are use cases for that)

Completely agree. I think a good point of distinction between pub/sub systems and unified log is: use pub/sub when the messages are a means-to-an-end (which is feeding one or more downstream apps); use unified log when the events are an end-in-themselves (i.e. you would still want to preserve the events even if there were no consumers live).

Anyway, I could talk about this stuff all day :-) - if you'd like to chat further, my details are in my profile!