Hacker News new | ask | show | jobs
by ssd532 998 days ago
I am contemplating this exact topic for my project at this moment. It would be great if you can briefly explain what, as per your understanding, stream vs queue semantic are. I am studying it and got somewhat confusing discussions on the internet and in person forums.
1 comments

Ok so the ELI5 is that a stream essentially has to be consumed in order while a queue can be processed out of order.

This is a gross oversimplification as all ELI5 are but it's a decent rule regardless.

The reason for this is that streaming systems by and large function on some sort of offset mechanism. Your client when it's receiving messages is generally calling something similar to poll(fromOffset, max) to get some messages and then keeping track of the max offset it's published somewhere (Kafka has consumer groups to help you store your offsets).

The problem with this model is you can only generally a) get messages in order on a given topic partition starting from some offset and b) you can only "commit" the latest message you processed.

This is fine if the chance of failure for a given message is the same for all messages. i.e streaming database updates into an backup. Either the backup target is available or it's not, if one message fails all would likely fail.

On the other end of the spectrum you have something like a queue of webhook jobs to execute against 3rd party/user supplied targets. The chance of any given webhook failing is entirely divorced from the rest in the queue.

So if you were to try use a stream for the webhook case you would quickly get blocked on the first bad webhook server you ran into. While with a proper queueing system you could kick that job back with a delay and process it again later without blocking work on other tasks or being able to commit which tasks have been processed.

This is generally called head of line blocking problem.