| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nikita2206 1114 days ago
	Very interesting to read, especially having done similar migrations it’s nice to see that the same choice is made by bigger players too (in terms of how to carry out this migration). I was surprised to see that they had to cancel those ~10 queries that were in flight in the moment when they needed to switch over the query traffic. When doing this with ProxySQL, there was an option to: pause all connections such that they can’t create new transactions and queries, while not cancelling running txs/queries, and then wait for all ongoing txs/queries to finish, and then do the switch and unpause.

2 comments

agf 1114 days ago

I've been in situations like this where the cost of killing active queries was lower than the cost of pausing traffic (and having it potentially back up or time out) for the extra time it would take for those queries to finish.

Just because you can wait for them to finish, doesn't mean it's better to when you look at the cutover as a whole.

link

ye-olde-sysrq 1114 days ago

Also, if you asked me to pick my poison: things get partially available / degraded for a long period of time, or there's a blip of full unavailability during a cutover, I'd pick the latter 9 times out of 10. I find people are pretty good about writing code to deal with "does it work y/n" but people are often a lot less good about "does it nominally work but is going so slow it will never complete / other things will time out in unexpected orders before this finishes / etc". Some of the worst incidents I've seen were "partial" outages that spanned a long time period until the right thing could be drained/kicked/whatever.

link

dmattia 1114 days ago

This took me a long time to accept in my career, but I do believe you've summarized this in a way that rings true for me as well.

link

sharadov 1113 days ago

Take that one big hit vs death by a thousand cuts!

link

nikita2206 1114 days ago

Ah indeed, this is a trade off.

link

yxre 1114 days ago

I was surprised that they went for database partitioning first.

Caching and optimization weren't mentioned at all, but I guess they already maxed out that path

link