|
|
|
|
|
by ploxiln
4047 days ago
|
|
According to the post, the pause/lockup could only happen under production queries, and was not directly related to production data. They did test against the full production data, afterwards, and could not reproduce. Which was what they expected, since the migrations were on empty tables, which just happened to have foreign key constraints against large tables, but no rows for those constraints to actually apply to. So for them to "fully" test this migration before applying it to production, they would need to be replaying all production queries against the testing database as well, and maybe even test the migration multiple times to get a statistical sense of the possible latencies. (I've actually done something like that before, but it's not something you do for every little change) |
|
Most of our queries are fast (tens of milliseconds or lower), which is how we got away without knowing about this for so long. Unsurprisingly, we've been making a bigger effort to eliminate any slow queries we do find lately. ;)