Hacker News new | ask | show | jobs
by strictfp 1516 days ago
I agree in principle. But one major headache for us has been upgrading the database software without downtime. Is there any solution that does this without major headaches? I would love some out-of-the-box solution.
8 comments

The best trick I know of for zero-downtime upgrades is to have a read-only mode.

Sure, that's not the same thing as pure zero-downtime but for many applications it's OK to put the entire thing into read-only mode for a few minutes at a well selected time of day.

While it's in read-only mode (so no writes are being accepted) you can spin up a brand new DB server, upgrade it, finish copying data across - do all kinds of big changes. Then you switch read-only mode back off again when you're finished.

I've even worked with a team used this trick to migrate between two data centers without visible end-user downtime.

A trick I've always wanted to try for smaller changes is the ability to "pause" traffic at a load balancer - effectively to have a 5 second period where each incoming HTTP request appears to take 5 seconds longer to return, but actually it's being held by the load balancer until some underlying upgrade has completed.

Depends how much you can get done in 5 seconds though!

>The best trick I know of for zero-downtime upgrades is to have a read-only mode.

I've done something similar, although it wasn't about upgrading the database. We needed to not only migrate data between different DB instances, but also between completely different data models (as part of refactoring). We had several options, such as proper replication + schema migration in the target DB, or by making the app itself write to two models at the same time (which would require a multi-stage release). It all sounded overly complex to me and prone to error, due to a lot of asynchronous code/queues running in parallel. I should also mention that our DB is sharded per tenant (i.e. per an organization). What I came up with was much simpler: I wrote a simple script which simply marked a shard read-only (for this feature), transformed and copied data via a simple HTTP interface, then marked it read-write again, and proceeded to the next shard. All other shards were read-write at a given moment. Since the migration window only affected a single shard at any given moment, no one noticed anything: for a tenant, it translated to 1-2 seconds of not being able to save. In case of problems it would also be easier to revert a few shards than the entire database.

I like this approach.

I'm picturing your migration script looping over shards. It flips it to read-only, migrates, then flips back to read-write.

How did the app handle having some shards in read-write mode pre-migration and other shards in read-write post-migration simultaneously.

Yes, it simply looped over shards, we already had a tool to do that.

The app handled it by proxying calls to the new implementation if the shard was marked as "post-migration", the API stayed the same. If it was "in migration", all write operations returned an error. If the state was "pre-migration", it worked as before.

I don't already remember the details but it was something about the event queue or the notification queue which made me prefer this approach over the others. When a shard was in migration, queue processing was also temporarily halted.

Knowing that a shard is completely "frozen" during migration made it much easier to reason about the whole process.

Depends on the database - I know that CockroachDB supports rolling upgrades with zero downtime, as it is built with a multi-primary architecture.

For PostgresQL or MySQL/MariaDB, your options are more limited. Here are two that come to mind, there may be more:

# The "Dual Writer" approach

1. Spin up a new database cluster on the new version. 2. Get all your data into it (including dual writes to both the old and new version). 3. Once you're confident that the new version is 100% up to date, switch to using it as your primary database. 4. Shut down the old cluster.

# The eventually consistent approach

1. Put a queue in front of each service for writes, where each service of your system has its own database. 2. When you need to upgrade the database, stop consuming from the queue, upgrade in place (bringing the DB down temporarily) and resume consumption once things are back online. 3. No service can directly read from another service's database. Eventually consistent caches/projections service reads during normal service operation and during the upgrade.

A system like this is more flexible, but suffers from stale reads or temporary service degradation.

Dual writing has huge downsides: namely you're now moving consistency into the application, and it's almost guaranteed that the databases won't match in any interesting application.
I'd think using built-in replication (e.g. PostgreSQL 'logical replication') for 'dual writing' should mostly avoid inconsistencies between the two versions of the DB, no?
Yes, though I've only ever seen people use the term "dual writing" to refer to something at a higher-than-DB-level.

The way I've done this involves logical replication also: https://news.ycombinator.com/item?id=31087197

Plus that you need to architect this yourself, with all the black magic involved to not mess something up.
Preconditions:

1. Route all traffic through pgbouncer in transaction pooling mode.

2. Logically replicate from old to new.

For failover:

1. Ensure replication is not far behind.

2. Issue a PAUSE on pgbouncer.

3. Wait for replication to be fully caught up.

4. Update pgbouncer config to point to the new database.

5. Issue a RELOAD on pgbouncer.

6. Issue a RESUME on pgbouncer.

Zero downtime; < 2s additional latency for in-flight queries at time of op is possible (and I've done it at scale).

The way I've done it with MySQL since 5.7 is to use multiple writers of which only one is actively used by clients. Take one out, upgrade it, put it back into replication but not serving requests until caught up. Switch the clients to writing to the upgraded one then upgrade the others.
This is such a huge problem. It's even worse than it looks: because users are slow to upgrade, changes to the database system take years to percolate down to the 99th percentile user. The decreases the incentive to do certain kinds of innovation. My opinion is that we need to fundamentally change how DBMS are engineered and deployed to support silent in-the-background minor version upgrades, and probably stop doing major version bumps that incorporate breaking changes.
The system needs to be architected in certain way to make upgrade without downtime. Something like the Command and Query Responsibility Segregation (CQRS) would work. A update queue serves as the explicit transaction log keeping track of the updates from the frontend applications, while the databases at the end of the queue applies updates and serves as the querying service. Upgrading the live database just means having a standby database with new version software replaying all the changes from the queue to catch up to the latest changes, pausing the live database from taking new changes from the queue when the new db has caught up, switching all client connections to the new db, and shutting down the old db.
Cassandra can do it since it has cell level timestamps, so you can mirror online writes and clone existing data to the new database, and there's no danger of newer mutations being overwritten by the bulk restored data.

Doing an active no-downtime database migration basically involves having a coherent row-level merge policy (assuming you AT LEAST have a per-row last updated column), or other tricks. Or maybe you temporarily write cell-level timestamps and then drop it later.

Or if you have data that expires on a window, you just do double-writes for that period and then switch over.

Migrate the data to a new host having the new version.