Hacker News new | ask | show | jobs
by zoogeny 969 days ago
I was thinking about this yesterday and your tip about TSTZRANGE is a good one. My main concern with audit table is that now every write becomes a transaction (if you are concerned with consistency between you master table and your audit table) and you have to decide on what content goes in your master table and what goes in your audit table. When I tried to sketch this out for my use case it turns out to be non-trivial.

I was thinking about this specifically for a jobs table where the jobs go through states (e.g. pending, in-progress, completed, failed). Having an audit of each of those states alongside relevant details might be useful. However, once I start hammering out the details I find myself reaching for pl/pgsql and worrying about putting too much business logic into my db. It starts to feel like I'm building a state machine on top of SQL.

I actually think this dual use-case is something we can really improve upon. In some fever-dream what I think I want is some combination of Kafka (append-only event stream) and Postgres (durable/reliable snapshot of current state).

1 comments

> I actually think this dual use-case is something we can really improve upon. In some fever-dream what I think I want is some combination of Kafka (append-only event stream) and Postgres (durable/reliable snapshot of current state).

Debezium will capture SQL changes to Kafka for you.

Kafka is amazing for what it is made for, but it doesn't seem to solve the query problem for past states. For example, if there is some intermediary details related to the "in-progress" state that will get overwritten once the job transitions into the "complete" state (or "error" state) then that is non-trivial to query from Kafka.

Even in you decide to keep those intermediary states in the main table then there are other niggles, like retries. If a job gets picked up and fails then I might write to an `error_details` column in the main table. However, if I have retries and the job fails a couple of times then only the latest error details are in the main table. If I want to reconstruct the history of the job I have to somehow retrieve each error event for that job from my append only log. And now I'm querying across systems and combining the data in the application tier.

I'm not saying these aren't solvable problems or that there doesn't exist tools already that can achieve what I'm talking about. Engineers love to say "why don't you just ..." for almost any conceivable problem. What I mean to say is that we seem to be separating things into different systems (append only logs vs. rdbms) which feel like they might be more tightly related. rdbms are like one half and append only logs are the other half. Maybe one day those halves will be combined.

Yes, Debezium is an implementation of the Transaction log tailing pattern an alternative to Transactional outbox pattern.