Hacker News new | ask | show | jobs
by boffinism 2966 days ago
> The process of deploying code to production is very simple, and takes about ten minutes total. This results in a life cycle in which we deploy code to production approximately 100 times per day.

What? They spend 1000 minutes out of every 1440 deploying to production? The deployment process is occurring over 16 hours out of every 24? Am I the only one who is nonplussed by this?

EDIT: Ok I get it, I get it. I guess I always worked in much smaller companies where CD meant deploying about 10 times a day tops. TIL big companies are big.

7 comments

When you have a team of 200 people, deploying 100 times a day isn't that big of a number.

A culture of continuous deployment is often hard to fathom for people who've never worked at a company with one. Everything, down to what you write and how you write it, is influenced by being able to deploy it and see its effects almost instantly.

These aren't huge, sweeping changes being deployed. They're small pieces of larger feature sets. It's more like: Deploy a conditional statement with logging and confirm from the logs that it works; next, deploy the view that you're testing with a feature flag toggled in a way that you and the pm can see it that is properly called from the conditional from before; when things look good, deploy some controller code that handles form requests from the view, etc.

You deploy small changes piecemeal and so spread out the risk over a larger period of time. It makes identifying issues with a new piece of code almost trivial. Needing to debug 30 lines of code is so much less harrowing than needing to look over 900.

Also depends what you are working on...
Not a slack employee but worked at a company with similar CD views:

(Likely) various groups of people are deploying to production throughout the day. Out of those 100 deploys, an individual is probably only involved in 1 or 2 a day. As soon as you're ready to deploy your code, you queue up and see it all the way through to production along with probably a few other people doing the same thing.

The actual "change the servers over to the new production code" process is usually instantaneous or extremely quick, the 10 minutes is mostly spent testing/building/etc.

People (including myself) enjoy this because you can push very small incremental changes to production, which significantly reduces the chance of confounding errors or major issues.

Note that this is would be a Sisyphean task if your company doesn't have great logging/metrics reporting/testing/etc.

I once worked on a project for one of the largest corporations in the world. First part of my career I only ever worked on smaller teams.

I was excited for the move to a large corporation where there would be amazing room for growth and learning.

I have to say that almost a year into my work on this project, I was absolutely stunned how inept this company was at coordinating a technology project.

Something a small team could accomplish in a matter of months was taking 100's of developers and 100's more in supporting / operational roles years to accomplish. My guess is the developers on this project would gladly trade places with Sisyphus.

It’s usually a Sisyphean task. Everyone wants to look and act cutting edge (“but Netflix!”) but nobody wants to make the necessary investments in the tooling, org structure, and management ability/support that is required to support that sort of deployment cadence (if your org focuses on who broke something instead of the process, and management doesn’t want to change that culture, all hope is already lost [based on experience in a large enterprise, YMMV]).

There are some legitimate needs for continuous deployment, the rest of it is cargo culting.

I think some organizations need it more than others. And I'm sure there's a lot of chasing the cool new thing. But I'd argue that almost every organization can benefit from continuous deployment and the discipline it requires.

The first time I switched from CI to full CD was circa 2011. I loved it because the mental bucket "later" went away. Except to the extent something was declared as a formal experiment in our A/B tests, code was either live or it wasn't. We were doing pair programming and committing every few hours, so aside from the little scratch-paper to-do list before we committed, there was no "later" for us. It made it real clear what our "good enough to ship" standards were. There was less room for bullshit. The resulting code was tighter, cleaner.

It also forced us to work much more closely as a team. We couldn't leave product questions for some end-of-iteration review. We had more mini-reviews with the product manager, and also improved our product judgment. Everybody trusted each other more. Partly because we had to, and partly because close collaboration is how you build trust.

It also shifted incentives further upstream. Suddenly there were no more big releases. No matter how big your vision, you had to learn to think of it in bite-sized pieces. It became less about answers, and more about questions. Not "Users need X!" but "What's the smallest thing we can do to see if users benefit from X?" Being able to make a lot of small bets made it easier to explore.

The Lean and Kanban folks talk a lot about "minimum WIP", where WIP is work in process. My CD experiences have definitely convinced me that they're right. Smaller pieces deployed more frequently requires a fair bit of discipline, but there are such huge gains in effectiveness and organizational sanity that I'll always try to work that way if I can.

There's continuous, and then there's continuous. Where I work (which is easy enough to find out if you're curious, and the same order of magnitude size of engineering team as Slack) we deploy hundreds of times a day. But that's because our units of deployment are small (and target-specific, so a single logical change can trigger multiple logical deployments). So we're not deploying the same thing hundreds of times per day, we're deploying hundreds of things once a day.

Make a change to a thing, which might take a few minutes or a few hours, get it reviewed and merged and it'll be in production a few minutes later.

Once you get past 500 engineers, even just completing a single task a week means 100 things to deploy every day: either you batch them together somehow or you work on the tooling to just get them to production without any fuss.

> There are some legitimate needs for continuous deployment, the rest of it is cargo culting.

Maybe, but I wouldn't go that far. Small companies already often do CD, because there's rarely a rigid deploy schedule. It's a practice people understand and feel the benefits of immediately. If you ask someone who moved from a small startup to a huge company what their biggest complaints are, I bet "longer/stricter deploy process" comes up 8/10 times.

When I think of cargo cult programming I think more of TDD or Agile: Practices that people aren't familiar with and often implement without understanding the benefits or reasoning.

For every developer that complains about the longer/stricter deploy process, I'd offer up for consideration deployments that went out through the CD pipeline where production data was mangled with no rollback possible. As with everything, its determining your appetite for risk.
Hmm, I don't see how that changes with longer/stricter deploy processes - unless you have some of the tooling around that makes CD very possible in the first place (automated checks/etc.)

I've certainly worked in places with very long and strict deploy processes that managed to mangle production data frequently. Even worse, because the deploy process was so strict and long the bad code managed to stay on production for much longer than 10 minutes (the deploy time mentioned in the article).

There's some vague notion out there that long deploy process == safe, but there's very little evidence to suggest that's the case. If anything, it seems much more dangerous because larger changesets are going out all at once.

It goes back to my original comment above; if you have the proper tooling (tests executed that must pass prior to deploy given a green light, green/blue deploys, canaries, automated datastore snapshots/point in time recovery, granular control of the deployment process), I think continuous deployment provides a great deal of value above what you've invested into the process. But that investment is critical if you've bought into CD. Otherwise, it's "deploy and pray".
I think you're assuming that each deploy is somehow disruptive or blocking, as if systems enter some "deploy mode". This may be true of a fleet overall, but is not when you take an instance into account.

Instead imagine each deploy as an edge, and imagine them to be near instantaneous. With respect to an instance, and a user, and the observable effects of a deploy, this paints a more accurate picture. 100 times a day means one deploy every 14.4 minutes.

How many times a day do you think Amazon deploys changes? Or Facebook? Or Google?

When I started at a real estate webhost in 2013, there was a "push Dev to Prod" script, and a scheduled task to run the script every 60 seconds. So we technically had 1440 pushes per day, though most developers were only working during daylight hours.

So having more pushes per day isn't necessarily the metric to maximize. Quality of code changes for each push is important, and this is where automated testing can be very valuable. The goal is for automated testing to be a "gatekeeper of bad code".

But even this system isn't perfect, and it's possible to deploy things that pass tests but still have show-stopping bugs. Or for the code to cause your tests to misbehave - I'm seeing this now with Tape.js on Travis, where Tape sees my S3 init calls as extra tests. Then my build fails because - of the 2 tests specified, 3 failed, and another 4 passed.

C'mon dude, it obviously doesn't require everyone in the company to spend 10 minutes for each deployment, it takes one person 10 minutes to deploy one thing.
Probably not even. One person starts a deploy, after 10 minutes they can deploy again.
You’re assuming a monolithic app, maybe there’s twenty services that get deployed five times a day.