I really enjoy reading how organizations have implemented continuous deployment.
One question I have which is not addressed by the article is how to deal with database changes. Every database has difficulty with schema migrations to one degree or another, but MySQL (which IIRC is what GitHub uses) is particularly bad. In my organization, we are VERY careful with any deploy that contains a migration.
(I suppose this is where GitHub's staging environments with real world datasets come in.)
One workaround I've considered is automatically deploying code that doesn't contain a migration (which is the vast majority) and forcing a more manual approach to database migrations, to make sure people are on hand to roll it back if necessary.
That's awesome! Seems like a really solid setup. Bufferapp took a page from your book and have the same deploy setup with hubot... it works out really well for them, too.
I have three questions to ask about DB migrations (which I can guess the answers to but would love to hear directly), if that's okay:
How do you handle a DB migration with a staged rollout (two 2 of N production servers)?
How do you organise timing between a migration deploy and code deploy if one is done before the other?
It's also really important that migrations don't affect the running app code. New columns shouldn't be used yet. Removed columns or tables need to have all references removed first before running the migration. We confirm this with a deprecation helper that sends metrics to Graphite.
That's about all I can answer from the app side :)
Was assuming that a column rename would duplicate the column during the first migration — so that the old and new codebase would work correctly. I guess the only complication is that you need to keep track of which branches have been successfully rebased/merged to master so you can run the second cleanup migrations.
The 'merged once deployed to production' thing, yes, I know even if advocated by GitHub, seems extremely weird to me. It does seem they have a staging check first, which is good.
It seems you'd want to merge it first, so that you know it when merged with "all the things" on master, so it more closely mirrors what you are going to get once it's merged in.
So they could just merge first, and then if staging passes in their CI system, automatically deploy to prod, which is the way many orgs do it.
My point is though, you'd want to deal with the merge fun (if any) first, else you are deciding to test branches (pull requests) that only have ALL of the commits from master (rebased, etc), so it's easier to just make sure they hop on master first, else you might "remove" something from prod for a while until it's merged in. Not good.
They may have some things to deal with that, but in this case, it doesn't seem like something I'd recommend for most people, and feels weird and organically evolved. One branch may not have the commits another has and both could be deployed without merging, leaving the github deployed code state fluctuating back and forth as one commit drops out and another drops in, before finally both are in at the same time.
I'm guessing that the master branch is first merged down to their develop branch or directly to their release branch. That way their release branch does have everything in master already.
> Since master is our stable release branch, we want to ensure that any branches being deployed are caught up with the latest code in master. Before proceeding with a deployment, Hubot will detect if our branch is behind and automatically merge in master if required.
We've adopted some aspects of this flow, and our take is that we test feature branches on stage thoroughly including CI test run and code review.
Then we merge to master and let all the CI run again while we manually verify. Any troubles and you revert. All green? Deploy right away. We try never to deploy more than 2-3 changes to production at a time.
The main bottleneck for us is the speed of our CI runs. It's tempting to merge in a lot of changes to master and let them accumulate on QA. Reducing the test run time is an ongoing goal and should make this system pretty scalable for our team.
Usually, you won't have merge conflicts if you deploy early and often and keep feature branches deliberately small. For larger stories, consider breaking it into discrete feature branches that implement part of the functionality (ex: behind a feature gate).
The deployment system automatically merges master into the branch being tested on deployment, and will not deploy any branch that does not contain master. In fact, there is a check that nags the main app team if master is not deployed to production for some reason (usually someone merging docs changes while someone is testing a "real" branch). It's considered an abnormal state, and I will often block all deployments until we figure out why master hasn't gone out.
Even forced deployments (which ignore CI and a few other checks for emergencies or maintenance mode) won't deploy a branch that's 24 hours behind master.
If one of your developer's Campfire (assuming y'all still use this) accounts gets popped, does the attacker now have the ability to deploy to production, or is there some other mitigating factor not mentioned here?
The "lab" environments have enough capacity to allow plenty of parallel testing. When deploying to production, it's expected that you already know your branch does what it's supposed to do. A production deploy is when one makes sure a branch has not introduced regressions, so holding it for more than 15 minutes is rare.
We're following github flow and are currently doing these steps manually on deployment (i.e. merge branch with master, check out branch). Then roll back to master if something fails, otherwise merge master with the branch. It would be great with some tooling to help this.
Has someone done open source work on this already?
One question I have which is not addressed by the article is how to deal with database changes. Every database has difficulty with schema migrations to one degree or another, but MySQL (which IIRC is what GitHub uses) is particularly bad. In my organization, we are VERY careful with any deploy that contains a migration.
(I suppose this is where GitHub's staging environments with real world datasets come in.)
One workaround I've considered is automatically deploying code that doesn't contain a migration (which is the vast majority) and forcing a more manual approach to database migrations, to make sure people are on hand to roll it back if necessary.