Hacker News new | ask | show | jobs
by elliot07 498 days ago
I agree with a lot of this, except for the part about de-risking deployments. That should not be a reason why to adopt a feature flag platform - that is a symptom of a bad deployment pipeline that should be fixed which is a whole other story.
6 comments

I disagree that using feature flags to de-risk deployments is a symptom of bad deployment pipelines.

There's several aspects of deployments that are in contention with each other: safety, deployment latency, and engineering overhead are how I'd break it down. Every deployment process is a tradeoff between these factors.

What I (maybe naively) think you're advocating is writing more end-to-end tests, which moves the needle towards safety at the expense of the other factors. In particular, having end to end tests that are materially better than well-written k8s health checks (which you already have, right?) is pretty hard. They might be flakey, they might depend on a lot of specifics of the application that's subject to change, and they might just not be prioritized. In my experience, the highest value end-to-end tests are based on learned experiences of what someone already saw go wrong once. Writing comprehensive testing before the feature is even out results in many low quality tests, which is an enormous drain on productivity to write them, to maintain them, and to deal with the flakey tests. It is better, I think, to have non-comprehensive end-to-end tests that provides as much value for the lowest overhead on human resources. And the safety tradeoff we make there can be mitigated by having the feature behind a flag.

My whole thesis, really, is that by using feature flags you can make better tradeoffs between these than you otherwise could.

> That should not be a reason why to adopt a feature flag platform

It's one of the two big reasons. First is the ability to rollout features gradually and separate deployments from feature release, and second is the ability to turn new features off when something goes wrong. Even part of the motivation of A/B testing is de-risking.

The risk of deployments isn’t entirely technical. Depending on your business and customer base it might be necessary for some groups to have access to the feature earlier or later than others.
Strong disagree here, my whole org does not roll out changes without feature flags at all and whenever someone doesn't follow this policy they cause large scale incidents. Feature flags are actually a sign the deployment pipeline is very sane and mature, because people understand any new code comes with unexpected risks and we should prevent these risks from taking down systems.
Sometimes the only way to try out a distributed system is to run it in prod and see what happens. Having the tools to flip behaviour within 1 second globally can be a useful escape hatch. When you get to large enough scales “just roll back” is not always good enough. I deploy systems with tens of thousands of nodes and we specifically have to rate limit how fast we deploy so we don’t cause thundering herds.
Very few teams have instant deployments. Even fast systems take a few minutes to run. If you can turn off a flag faster (because it’s a DB record), then you should do that.