Hacker News new | ask | show | jobs
by aidos 1129 days ago
I use multi-column indexes for things like, “find me the most recent version of this template for this customer”.

I really wish pg had a way to do partial indexes with limits so I could create a partial index that stores, for example, only the most recent version of something (I find this comes up a lot).

4 comments

Ah, I've had this same use case: I store many versions of a resource from an external API, but I typically only want the most recent version. I've used the following techniques:

Select the ID with the max sync_token. Easiest for Postgres to optimize, assuming you have a primary key of (id, sync_token).

    SELECT *
    FROM external_api
    WHERE id = :my_id 
      AND sync_token = (SELECT max(sync_token) FROM ext_api WHERE id = :my_id)
Define a view using DISTINCT ON. Convenient for ad-hoc querying. Postgres usually figures out it can use the primary key to avoid a full-table scan.

    SELECT DISTINCT ON (id) *
    FROM external_api
    WHERE id = :my_id 
      AND sync_token = (SELECT max(sync_token) FROM ext_api WHERE id = :my_id)
    ORDER BY id, sync_token DESC
For tricky predicates, I use a trigger to track the most recent resource in a separate table. This is a hacky version of incremental view maintenance. [1]

[1]: https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...

Yup exactly. I also keep an eye on the progress of incremental views because there are a lot of use cases. Where possible I avoid triggers, but I’ve also used that solution too.

I’ve found that Postgres often gives the best results if you lateral join to the version table. That way it starts with the primary rows you need and it just hammers the index once for each version row.

> Postgres often gives the best results if you lateral join to the version table

Agreed.

  > SELECT *
  > FROM external_api
  > WHERE id = :my_id 
  >   AND sync_token = (SELECT max(sync_token) FROM ext_api WHERE id = :my_id)
Does this have an advantage over

  SELECT * FROM external_api
    WHERE id = :my_id
    ORDER BY sync_token DESC
    LIMIT 1
? Assuming the index is on (id, sync_token).
For my specific setup and a single row lookup, the ORDER BY ... LIMIT 1 is faster (0.1 ms vs. 1.2 ms).

The ORDER BY ... LIMIT 1 one is the same as the DISTINCT ON query, but the DISTINCT ON can return more than one resource.

Interesting, how did you achieve that with other databases? An enhancement opportunity for PostgreSQL?

For a similar sounding problem, I used a view with row_number() over () with a partition() clause sorted descending so that row number 1 was always the most recent within the partition columns and older were 2, 3 etc. I could then query for the row number = 1 to get the most recent row (or 2 for 2nd etc.). For the most recent only, I had a view which had that row number = 1 condition and I used that most frequently to access data.

To get an index, the view could be materialized but then it needs refreshed but my experience was that had more overhead than just using the regular view.

I haven't used then myself, but Postgres supports partitions. If you have large amounts of data, you could partition by a date range.

https://www.postgresql.org/docs/current/ddl-partitioning.htm...

> for example, only the most recent version of something

Inherently cost prohibitive. Maintaining the index after a delete is O(n) operation.

If you would like to take on some of that burden yourself, you can make a `lastest` bool flag and make a partial index on that.