Hacker News new | ask | show | jobs
by sumtechguy 2082 days ago
Having used both types. Turning the pub/sub into a queue has some advantages in debugging and processing. Kafka has the idea of each queue being partitioned and having hash keys. Which means you can have a bunch of processes reading from the same queue and no one really steps on each other. Basically sharding at the data stream level with guaranteed ordering. It is a neat concept. Another is playback. Kafka uses a groupid/offset to keep track of where you are at. Another nice bit is messages are decently hard to lose as they stick around and you can playback by just moving the offset. The update is maybe 10 bytes into a memorybacked filestore. At first I too was skeptical of perf but it can scale very nicely and lets you scale a topic horizontally as well as vertically. In the background you have an expire time for a message. So maybe you only keep it for one week. Or you can set it to last years. For something like that you would be better off putting it in a db table though.

Idempotent is a good idea even in a system like this. But it is not always possible as your upstream data sources may be something very different.