Hacker News new | ask | show | jobs
by vemv 3582 days ago
I'd say rollback is possible under a 'git driven' workflow.

That said, sincerely I find rollback one of those inherently complex ideas:

- Rolling back assumes going back to the previous commit will fix everything, an unproven (unprovable?) hypothesis in the face of database migrations, job queues, etc.

- Making database migrations reversible can be nearly impossible (particularly at scale), aside from a significant engineering effort (for something that should absolutely never happen)

So I just don't contemplate the possibility of rolling back a deploy.

Instead I try things (particularly migrations) on staging rigurously:

- staging environments always ephemeral - created from scratch for a given relase

- always load fresh production DB into staging

- check that all my model objects are still `.valid?` (http://api.rubyonrails.org/classes/ActiveRecord/Validations....) after the migration

- leave staging running a few days.

- if you really can (not easy), forward production traffic to staging as well.

If things go wrong (which under my proposed discipline would be a massive screw-up), then the fix would require analysis, a regular fix (no time travelling), and a regular deploy.

Reacting instantly (i.e. without analysis) is kind of delusional thinking. I'd rather stay broken a little longer for avoiding further complications!

2 comments

I heartily agree with your sentiment "rollback one of those inherently complex ideas" :) It's true that sometimes it's not even well-defined, such as for database migrations.

Some of this also depends on context. If I'm shipping a single primary deployment of a massive fairly monolithic SaaS product, I can do this time-marches-on stuff. If I'm actually shipping shrinkwrapware -- and as a sibling comment says, doing rolling blue-green deploys also looks like this, if briefly -- switching something back to a previous code version is very worth minding.

Code rollbacks are about immediate mitigation, not about pie in the sky snapshot rollback. If you are sane about deployment and don't go to 100% of traffic instantly, then halting a broken deploy and rolling back is certainly better than shifting into analysis mode.

> I'd rather stay broken a little longer for avoiding further complications!

Unfortunately analysis is slow and an unbounded process, and high leverage businesses where every second of downtime has actual measurable loss simply cannot accept this trade-off.

Good point. Few solutions are apt for every scale and every business!

OTOH, if you really essay a given deployment again and again, you can become really confident that the operation will succeed in production.

Real example: the most important feature I've developed this year has been put 5+ times in staging across a couple months. Every time I've asserted all kind of stuff, gathered feedback from the business owner, etc.

The deployment going bad in production is just not a possibility.

At a larger scale than mine, I would probably introduce 'dark launching' as well. That would further reduce the possibility of needing rollback.