Hacker News new | ask | show | jobs
by exekias 985 days ago
I believe this is one of the reasons why migrations become scary in many cases. If something goes wrong "the only path is forward". Also, rolling out new versions of the application means either breaking the previous versions (with some instances still running) or doing the migration in several steps.

We believe there is a better way, they way pgroll works, you can start a migration, and keep the old & new schemas working for as long as you need to rollout your app. If the new version of the app/schema doesn't behave as you were expecting, you only need to rollback the commit and undo the migration. pgroll guarantees that the previous version is still working during the whole process.

There is a graph in the readme depicting this concept:

https://github.com/xataio/pgroll/blob/main/docs/img/migratio...

4 comments

> If the new version of the app/schema doesn't behave as you were expecting, you only need to rollback the commit and undo the migration.

If I delete a "last_name" column, apply the migration, and then decide I shouldn't have deleted users' last names. Do I get that data back?

Just from my understanding from having read the linked website: yes, you do.

"Applying the migration" doesn't actually do anything to the table, it just creates a new schema containing views over the old one, where the view for the table whose column you deleted hides the column. You can then try if your app still works when using accessing this schema instead of the old one. If you're happy, you can "complete" the migration, at which point only the table structure actually gets altered in a non-reversible way.

^ this is exactly how it works :)
But if it works like that aren’t there schema migration paths that are changing the actual content of a column and are then not undoable?
Any pgroll operations[0] that require a change to an existing column, such as adding a constraint, will create a new copy of the column and backfill it using 'up' SQL defined in the migration and apply the change to that new column.

There are no operations that will modify the data of an existing column in-place, as this would violate the invariant that the old schema must remain usable alongside the new one.

[0] - https://github.com/xataio/pgroll/tree/main/docs#operations-r...

Maybe this is explained somewhere in the docs but I'm lazy: how does it cope with possible performance issue in highly trafficked tables? Can you somehow control the backfill speed if it's taking up too much I/O?
Also, if the data isn't deleted couldn't this lead to database bloat?
The bloat incurred by the extra column is certainly present while the migration is in progress (ie after it's been started with `pgroll start` but before running `pgroll complete`).

Once the migration is completed any extra columns are dropped.

I recently had to do a migration on a timescale hypertable where a "schema" was migrated for a table which had jsonb columns containing arrays of arrays of numbers to a new table containing the same data as two-dimensional postgres arrays of numeric[][] data (better storage characteristics)

Our workflow was something like:

1) Create the new hypertable

2) Create after insert trigger on first table to insert transformed data from first table into second table, and delete from first table (this ensured applications can continue running using first schema/table, without any new data being added to first table after migration)

3) Iterate over first table in time-bucketed batches using a plpgsql block to move chunks of data from first table to second table.

Would pgroll enable a similar workflow? I guess I'm curious if the way pgroll works would similarly create a trigger to allow apps to work with the initial schema as a stopgap... I guess pgroll would perform the whole migration as a series of column updates on a single table, but I'm unclear on whether it attempts to migrate all data in one step (potentially locking the table for longer periods?) while also allowing applications using the old schema to continue working so there is no downtime as changes are rolled out.

Has pgroll been tested with timescaledb at all?

To do this with pgroll I would use an alter_column migration, changing the type: https://github.com/xataio/pgroll/tree/main/docs#change-type, this would:

1) Create a new column with the desired type (numeric[][] in your case) 2) Backfill it from the original one, executing the up function to do the casting and any required transformation 3) Install a trigger to execute the up function for every new insert/update happening in the old schema version 4) After complete, remove the old column, as it's no longer needed in the new version of the schema

Backfills are executed in batches, you can check how that works here: https://github.com/xataio/pgroll/blob/main/pkg/migrations/ba...

I don't think any of us has tested pgroll against timescaledb but I would love to know about the results if anyone does!

Is my understanding correct that the need to copy columns makes starting a migration potentially extremely expensive on a large database?
Yes, for those pgroll migrations that require a new column + backfill, starting the migration can be expensive.

Backfills are done in fixed size batches to avoid long lived row locks, but the operation can still be expensive in terms of time and potentially I/O. Options to control the rate of backfilling could be a useful addition here but they aren't present yet.

This is almost exactly how I did a similar migration, also in Timescale. I used PL/pgSQL and sqitch, did you use a migration tool?
No, this was all done in handwritten .sql scripts. I don't think it matters too much in this case, but we're using Rust and the sqlx cli for driving the migrations, but that basically just runs the sql migration scripts
Apologies for the off-topic-ness, but no matter where I've tried putting the mouse focus on the post, Up/Down don't work to scroll (but PgUp/PgDown are fine).

(I very much appreciate the effort to provide tooling that puts all these things together, btw)

Thanks for reporting! we will look into it
Sorry, forgot to say: Win10 / Firefox.
That's great that pgroll does this, but the heavy lifting for supporting this comes at a huge cost on the application side, IMO.
Do you mean the extra configuration required to make applications use the correct version of the database schema, or something else?
Yea, keeping your application consistent with two different schema versions. And I'm not saying from a blue/green standpoint, from whatever pgroll does instead so when the rollback happens you don't lose data.
I don't see the need to keep your application consistent with both schema versions. During a migration pgroll exposes two Postgres schema - one for the old version of the database schema and another for the new one. The old version of the application can be ignorant of the new schema and the new version of the application can be ignorant of the old.

pgroll (or rather the database triggers that it creates along with the up and down SQL defined in the migration) does the work to ensure that data written by the old applications is visible to the new and vice-versa.

A rollback in pgroll then only requires dropping the schema that contains the new version of the views on the underlying tables and any new versions of columns that were created to support them.

Eh, isn't this making it easier on the application side? Today, when I make a change in my app that needs a migration, I need multiple steps to make sure I don't break old instances of the app still running. With this it looks like one can avoid all that?
That is what this project is trying to achieve. By allowing your client apps to access both the old and the new schema at the same time, you can have the old instances of your application working while the new ones (using the new schema) get deployed.

They can work in parallel for a while until you complete the rollout and call the migration as done.