| HN Mirror

Kafka is amazing for what it is made for, but it doesn't seem to solve the query problem for past states. For example, if there is some intermediary details related to the "in-progress" state that will get overwritten once the job transitions into the "complete" state (or "error" state) then that is non-trivial to query from Kafka.

Even in you decide to keep those intermediary states in the main table then there are other niggles, like retries. If a job gets picked up and fails then I might write to an `error_details` column in the main table. However, if I have retries and the job fails a couple of times then only the latest error details are in the main table. If I want to reconstruct the history of the job I have to somehow retrieve each error event for that job from my append only log. And now I'm querying across systems and combining the data in the application tier.

I'm not saying these aren't solvable problems or that there doesn't exist tools already that can achieve what I'm talking about. Engineers love to say "why don't you just ..." for almost any conceivable problem. What I mean to say is that we seem to be separating things into different systems (append only logs vs. rdbms) which feel like they might be more tightly related. rdbms are like one half and append only logs are the other half. Maybe one day those halves will be combined.