| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by postgressomethi 1041 days ago
	Polling an updated_at column is not robust in its most simple form, as transactions are not guaranteed to commit in that order.

3 comments

_acco 1041 days ago

Author here. Good point. For those that are curious, parent is referring to the following situation:

1. Transaction A starts, its before trigger fires, Row 1 has its updated_at timestamp set to 2023-09-22 12:00:01.

2. Transaction B starts a moment later, its before trigger fires, Row 2 has its updated_at timestamp set to 2023-09-22 12:00:02.

3. Transaction B commits successfully.

4. Polling query runs, sees Row 2 as the latest change, and updates its cursor to 2023-09-22 12:00:02.

5. Transaction A then commits successfully.

A simple way to avoid this issue is to not poll close to real-time, as the order is eventually consistent.

Perhaps a more robust suggestion would be to use a sequence? Imagine a new column, `updated_at_idx`, that incremented every time a row was changed.

link

postgressomethi 1041 days ago

Sequences kind of have the same issue, because you don't know if a gap is because of a rollback or an uncommitted transaction. Though with some logic you can do a pretty good job at this with sequences. And then you're not in the realm of "simple" anymore, at all.

link

_acco 1041 days ago

Any ideas for a simple polling implementation that's more robust?

link

farsa 1041 days ago

It's not exactly simple as it involves some postgres specific knowledge, but you can make it reliable when working with transaction ids (see https://event-driven.io/en/ordering_in_postgres_outbox/).

link

qazxcvbnm 1041 days ago

I've had pretty much the exact same problem and what I went for in my low-volume case was to simply add advisory locks such that I can guarantee the transaction start times provide correct ordering.

link

oconnore 1041 days ago

Set the trigger to add the primary key + change time to a separate table, then scan/truncate that table to poll changes.

link

iknownothow 1041 days ago

Woah, that's news to me. Is that true even if triggers are used to update a column?

  CREATE OR REPLACE FUNCTION
      update_updated_at_function()
  RETURNS TRIGGER AS $$
  BEGIN
      NEW.updated_at = now();
      RETURN NEW;
  END;
  $$ language 'plpgsql';


  CREATE TRIGGER
      update_updated_at_trigger
  BEFORE INSERT OR UPDATE ON
      "my_schema"."my_table"
  FOR EACH ROW EXECUTE PROCEDURE
      update_updated_at_function();
  END $$;

Is it possible for two rows to have `updated_at` timestamps that are different from the transaction commit order even if the above function and trigger are used? It's alright if `updated_at` and the commit timestamp are not the same, but the `updated_at` must represent commit order accurate to the millisecond/microsecond.

link

smilliken 1041 days ago

To confirm your fear, you can't use the updated_at timestamp as a proxy for commit order. The commits happen in a different order, and can be arbitrarily far apart, like hours or days depending on how long your transactions can last.

link

singron 1041 days ago

now() is the timestamp the transaction began at. There is no function to return the commit timestamp because you have to write the value before you commit.

link

physicles 1041 days ago

For polling, instead of updated_at, I use a _txid column that gets set by a trigger to the current transaction ID. Then, when polling, use txid_current() to see which transactions have committed and which haven’t. It’s a little dicey and super easy to hit fencepost errors, but it’s been running smoothly in production for a few years.

link