Hacker News new | ask | show | jobs
by biokoda 3901 days ago
If the consumer is stateless then it needs to acknowledge every received event for it to be reliable. Otherwise the producer may think something is sent when it actually never arrived (tcp connection was closed).

So it's either unreliable or slow.

Also if you have dynamic transient worker topologies, you have to remember those positions. You are saving data for later use, that may never arrive. How long do you keep this data?

Seems like a pretty messy way of doing things.

Completely agree about LevelDB.

2 comments

TCP would guarantee delivery, but you're right in that you wouldn't know if the consumer actually did anything with the message. It could have crashed on parsing or something.

But moving the concern to the consumer to track the cursor doesn't make the protocol any more stable. To keep a stable cursor, the consumer would need to persist that someplace, which just pushes the acknowledgement to that persistence component instead. If a stable cursor is what you're after, then co-locating it with the durable queue provides a simpler solution with a slightly better consistency guarantee.

The garbage collection problem is a real one, but realistically how many consumers is an infrastructure service like this going to have? Tens? Hundreds? Thousands? Millions? Billions?

No matter which one of those you pick it's a trivially small secondary index to maintain even if you never reaped it. I mean it's a K/V problem (consumer_id -> queue_offest) and there's a K/V store already sitting there. If you didn't want it to grow forever then you could establish a TTL policy via configuration.

The problem you would have is consumers that don't have stable or bounded id's. Like a system that assigns a new id every time the consumer makes a request or the consumer is restarted.

> TCP would guarantee delivery

Calling send just copies the buffer to the kernel/driver. When the call returns you do not know how much of it is actually sent. You might have the situation of the producer thinking it was sent, when it in fact never actually made it onto the network.

In case of reliable fetch failure each consumer group will keep it's own queue of failed deliveries (persisted on disk), will check that queue and serve these failed items first.