Hacker News new | ask | show | jobs
by agentgt 3574 days ago
One of the tricks I have found for Postgres to manage analytics and unstructured data is using its inheritance with check constraints aka partitioning features. I mention it because not many people seem to know about this bad ass feature of postgres.

We use inheritance (with check constraints) [1] for both time partitioning as well as for custom (aka unstructured) events. Most events have several (100s in our case) common columns. When you see a new custom event with custom fields you want to separate out you can create a subtable and have it inherit from the base event table.

Now querying for those custom events is extremely fast and you can query from the base event type or the subtable directly.

[1]: https://www.postgresql.org/docs/current/static/ddl-partition...

1 comments

How do you manage the partitioning? We've thought of partitioning events (by time and/or by type of event) but we haven't found an easy way to migrate to partitioned tables and we aren't exactly sure how we would maintain the partitioning.
It depends on how you partition. If you partition with no extra columns which is the typical case for partitioning by time you can use triggers and insert on the base table. The postgresql doc covers this.

However if you add columns to your subtables (which is usually the case for adding events by type) you will need to insert into the correct table.

You can enforce this by making it so you can never insert into the parent table

    -- FORCE programmatic dispatch to table (v9.2).
    ALTER TABLE event ADD CONSTRAINT event_no_insert CHECK (false) no inherit;
Now you have to insert in the chid tables. This requires business logic on your end.

As far as refactoring goes the typically approach is to create the new tables and insert the data into them and then do a name swap and then add the check constraints.

And what about time partitioning? Do you have a job that creates new tables and merges old ones together?