| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pcthrowaway 1032 days ago

I recently had to do a migration on a timescale hypertable where a "schema" was migrated for a table which had jsonb columns containing arrays of arrays of numbers to a new table containing the same data as two-dimensional postgres arrays of numeric[][] data (better storage characteristics)

Our workflow was something like:

1) Create the new hypertable

2) Create after insert trigger on first table to insert transformed data from first table into second table, and delete from first table (this ensured applications can continue running using first schema/table, without any new data being added to first table after migration)

3) Iterate over first table in time-bucketed batches using a plpgsql block to move chunks of data from first table to second table.

Would pgroll enable a similar workflow? I guess I'm curious if the way pgroll works would similarly create a trigger to allow apps to work with the initial schema as a stopgap... I guess pgroll would perform the whole migration as a series of column updates on a single table, but I'm unclear on whether it attempts to migrate all data in one step (potentially locking the table for longer periods?) while also allowing applications using the old schema to continue working so there is no downtime as changes are rolled out.

Has pgroll been tested with timescaledb at all?

2 comments

exekias 1032 days ago

To do this with pgroll I would use an alter_column migration, changing the type: https://github.com/xataio/pgroll/tree/main/docs#change-type, this would:

1) Create a new column with the desired type (numeric[][] in your case) 2) Backfill it from the original one, executing the up function to do the casting and any required transformation 3) Install a trigger to execute the up function for every new insert/update happening in the old schema version 4) After complete, remove the old column, as it's no longer needed in the new version of the schema

Backfills are executed in batches, you can check how that works here: https://github.com/xataio/pgroll/blob/main/pkg/migrations/ba...

I don't think any of us has tested pgroll against timescaledb but I would love to know about the results if anyone does!

link

vlovich123 1032 days ago

Is my understanding correct that the need to copy columns makes starting a migration potentially extremely expensive on a large database?

link

surjection 1032 days ago

Yes, for those pgroll migrations that require a new column + backfill, starting the migration can be expensive.

Backfills are done in fixed size batches to avoid long lived row locks, but the operation can still be expensive in terms of time and potentially I/O. Options to control the rate of backfilling could be a useful addition here but they aren't present yet.

link

claytonjy 1032 days ago

This is almost exactly how I did a similar migration, also in Timescale. I used PL/pgSQL and sqitch, did you use a migration tool?

link

pcthrowaway 1032 days ago

No, this was all done in handwritten .sql scripts. I don't think it matters too much in this case, but we're using Rust and the sqlx cli for driving the migrations, but that basically just runs the sql migration scripts

link