Hacker News new | ask | show | jobs
by sa46 1124 days ago
Ah, I've had this same use case: I store many versions of a resource from an external API, but I typically only want the most recent version. I've used the following techniques:

Select the ID with the max sync_token. Easiest for Postgres to optimize, assuming you have a primary key of (id, sync_token).

    SELECT *
    FROM external_api
    WHERE id = :my_id 
      AND sync_token = (SELECT max(sync_token) FROM ext_api WHERE id = :my_id)
Define a view using DISTINCT ON. Convenient for ad-hoc querying. Postgres usually figures out it can use the primary key to avoid a full-table scan.

    SELECT DISTINCT ON (id) *
    FROM external_api
    WHERE id = :my_id 
      AND sync_token = (SELECT max(sync_token) FROM ext_api WHERE id = :my_id)
    ORDER BY id, sync_token DESC
For tricky predicates, I use a trigger to track the most recent resource in a separate table. This is a hacky version of incremental view maintenance. [1]

[1]: https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...

2 comments

Yup exactly. I also keep an eye on the progress of incremental views because there are a lot of use cases. Where possible I avoid triggers, but I’ve also used that solution too.

I’ve found that Postgres often gives the best results if you lateral join to the version table. That way it starts with the primary rows you need and it just hammers the index once for each version row.

> Postgres often gives the best results if you lateral join to the version table

Agreed.

  > SELECT *
  > FROM external_api
  > WHERE id = :my_id 
  >   AND sync_token = (SELECT max(sync_token) FROM ext_api WHERE id = :my_id)
Does this have an advantage over

  SELECT * FROM external_api
    WHERE id = :my_id
    ORDER BY sync_token DESC
    LIMIT 1
? Assuming the index is on (id, sync_token).
For my specific setup and a single row lookup, the ORDER BY ... LIMIT 1 is faster (0.1 ms vs. 1.2 ms).

The ORDER BY ... LIMIT 1 one is the same as the DISTINCT ON query, but the DISTINCT ON can return more than one resource.