Hacker News new | ask | show | jobs
by mabbo 1004 days ago
> Make feature flags short-lived. Do not confuse flags with application configuration.

This is my current battle.

I introduced feature flags to the team as a means to separate deployment from launch of new features. For the sake of getting it working and used, I made the mis-step of backing the flags with config files with the intent to get Launch Darkly or Unleash working ASAP instead to replace them.

Then another dev decided that these Feature Flags look like a great way to implement permanent application configs for different subsets of entities in our system. In fact, he evangelized it in his design for a major new project (I was not invited to the review).

Now I have to stand back and watch as the feature flags are being used for long-term configurations. I objected when I saw the misuse- in a code review I said "hey that's not what these are for"- and was overruled by management. This is the design, there's no time to update it, I'm sure we can fix it later, someday.

Lesson learned: make it very hard to misuse meta-features like feature flags, or someone will use them to get their stuff done faster.

11 comments

Sadly, this is a battle you are destined to lose. I have almost completely given up. The best you can aim for is to use feature flags better rather than worse.

    - Some flags are going to stay forever: kill switches, load shedding, etc. (vendors are starting to incorporate this in the UI)
    - Unless you have a very-easy-to-use way to add arbitrary boolean feature toggles to individual user accounts (which can become its own mess), people are going to find it vastly easier to create feature flags with per-use override lists (almost all of them let you override on primary token). They will use your feature flags for:
      - Preview features: "is this user in the preview group?"
      - rollouts that might not ever go 100%: "should this organization use the old login flow?"
      - business-critical attributes that it would be a major incident to revert to defaults: "does this user operate under the alternate tax regime?"
You can try to fight this (indeed, especially for that last one, you most definitely should!), but you will not ever completely win the feature flag ideological purity war!
Thank you for this great list of the immense business value derived from "misusing" feature flags!
In my org, I think I’ve go the feature flag thing mostly down.

We started with a customer specific configuration system that allows arbitrary values matching a defined schema. It’s very easy to add to the schema (define the config name, types, and permissions to read or write it in a JSON schema document).

We have an administration panel with a full view of the JSON config for our support specialist and and even more detailed one for developers.

Most config values get a user interface as well.

From there we just have a namespace in the configuration for “feature flags”. Sometimes these are very short lived (2-4 sprints until the feature is done), but others can last a lot longer.

There are an unfortunate couple that will probably never go away at this point (because of some enterprise customer with a niche use case in the “legacy” version of the feature that we’ve not yet implemented compatibility with and I don’t know when it will get on our roadmap to do so), but in the end they can just be migrated into normal config values if needed.

A little tooling layer on top lets us query and write to the configs of thousands of sites at once as well.

We have an interesting hybrid between the two that I'd like your take on. When we release new versions of our web client static assets we have a version number that we bump that moves folks over to the new version.

1. We could stick it in a standard conf system and serve it up randomly based on what host a client hits. (Or come up with more sophisticated rollouts)

2. Or we can put it as "perm" conf in the feature flag system and roll it out based on different cohorts/segments.

I'm leaning towards #2 but I'd love to understand why you want to prohibit long lived keys so I can make a more informed choice. The original blog posts main reasons were that FF systems favor availability over consistency so make a pour tool if you need fast converging global config, which somewhat becomes challenging here during rollbacks but is likely not the end of the world.

1) If you do Slack, then I recommend you join the #openfeature channel on the CNCF Slack. The inviter is here: https://communityinviter.com/apps/cloud-native/cncf

2) The downside of rolling it out based on host is that you could refresh your page, hit a different host, and see the UI bouncing back and forth between versions. As long as you always plan to roll things to 100%, this is the perfect use case for a feature flag.

Or... see them for what they are: runtime configuration. The name implies a use case scenario, but in reality it's just a configuration knob. With a good UI, it's a pretty damn convenient way to do runtime configuration.

So of course they'll be used for long-term configuration purposes, especially under pressure and for gradual rollouts of whole systems, not just A/B testing features.

This hits the nail on the head.

The term "feature flag" has come to inherently have a time component because features are supposed to eventually be fulled GA'd.

What I've seen in practice is feature flags are never removed so a better way to think about them is as a runtime configuration.

I think the reason feature flags are never removed is because the timeframe that a given feature-flag is top-of-mind is also when it's at its most useful. Later when it's calcified in place and the off-state may be broken/atrophied, no one is really thinking about it.

I'm also not convinced it's always a huge problem. I can imagine sometimes it is, but in most codebases I've worked on, it's more of an annoyance but not cracking the top 3 or 5 biggest problems we wanted to focus on.

IMHO the best solution is not something heavy handed like a policy that we only use run-time config for fixed timeframes, or a process where we regularly audit and prune old flags. It's simply to keep a record of the config changes over time so anyone interested can see the history, and a culture where every engineer is encouraged to take a little extra time to verify and remove dead stuff whenever it crosses their path .

The mental overhead of reading code like this is massive. Leaving feature flags in with the alternate branch left to rot leads to a codebase that is nearly impossible to understand. No purpose is served by not deleting the now unused branch except you save one developer an afternoon of work. But that time is quickly recouped when the entire team, and especially new hires, only have half as much code to understand.
> What I've seen in practice is feature flags are never removed so a better way to think about them is as a runtime configuration.

SaaS won't sell itself unless it redefines the problem and presents itself as a solution...

There is a need for runtime configurations, yes, but it's important to put them behind an interface intended for that, and not one intended for something else.
I can immediately see if the config is being requested, which system requests it, what are the metadata of the request, etc. I can do conditional rollout of a configuration based on runtime data. I can reset the configuration to a know-good failsafe default without asking for approval with a break-glass button. I can schedule a rollout and get a reviewer for the config change.

IME the feature flag interface is next to perfect for runtime configuration. I don't care for intended usage at all. You could say feature flags have found a great product-market fit, just that a segment of the market is a bit unexpected but makes perfect sense if you think about it.

This gets messy at larger scales, both as teams grow and software grows.

Resetting to a know failsafe works as long ask the risk of someone changing a backend service (or, multiple services) at the same time is low. Once it isn't, you can most definitely do more damage (and make life harder for oncall).

Who controls the runtime config? One person? Half a dozen? One hundred plus? Is it being gated by approvals, or can anyone do it? What about auditability? If something does go wrong, how easily can I rule out you turning on that flag?

Finally there is simply the sheer permutations you introduce here. A feature flag is binary in many cases: on or off. A config could be in any number of states.

These things make me nervous as an architect, and I've seen well intentioned changes fail when good flag discipline wasn't followed. Using it as fullblown runtime config seems like a postmortem waiting to happen.

I am tempted to agree: if separating the two is key (I’m not convinced that it is, but happy to assume) why not copy the interface and the infrastructure of the feature flag and offer it as a configuration tool.

I feel like you could easily add a status to flags, to mark whether they are part of a release process, or a permanent configuration tool, and in the latter case, take them off the release interfaces.

I think unleash offers "toggle type" which can take values that describe whether it's a pure feature flag, something operation, config, etc.
Could you expand on what you think the different interfaces should be? you keep stating that these things ought to be distinct but haven't explained why beyond dogma.
Our FF system uses our config system as its system of record. There's some potential for misuse, and it's difficult to apply deadlines. On the plus side all our settings are captured in version control. Before they were spread out over several systems, one of which had an audit system that was pure tribal knowledge for years.
The main challenge is when things go wrong. Feature flags are designed for high-rate evaluation with low latency responses. Configuration usually doesn't care that much about latency as it's usually read once at startup. This context leads to some very specific tradeoffs such as erring to availability over consistency, which in the case of configuration management could be a bad choice
Yeah, and assuming they are done well, they probably have better analytics and insights attached to them than anything else except perhaps your experiments!
Long lived features flags is a development process bug, I'm not sure we can solve it with the feature toggle system.

I'm at the point of deciding that Scrum is fundamentally incompatible with feature flags. We demo the code long before the flag has been removed, which leads to perverse incentives. If you want flags to go away in a timely manner you need WIP limits, and columns for those elements of the lifecycle. In short: Kanban doesn't (have to) have this problem.

And even the fixes I can imagine like the above, I'm not entirely sure you can stop your bad actor, because it's going to be months before anyone notices that the flags have long overstayed their welcome.

I'm partial to flags being under version control, where we have an audit trail. However time and again what we really need is a summary of how long each flag has existed, so they can be gotten rid of. The Kanban solution I mention above is only a 90% solution - it's easy to forget you added a flag (or added 3 but deleted 2)

I faced something similar, and I think it's unavoidable. Give people a screwdriver and they'll find a way of using it as a hammer.

The best you can do is expect the feature flagging solution to give some kind of warning for tech debt. Then equip them with alternative tools for configuration management. Rather than forbidding, give them options, but if it's not your scope, I'd let them be (I know as engineers this is hard to do :P).

> Give people a screwdriver and they'll find a way of using it as a hammer.

I feel like feature flags aren't that far off though. They're fantastic for many uses of runtime configuration as mentioned in another comment.

There's multiple people in this thread complaining about "abuse" of feature flags but no one has been able to voice why it's abuse instead of just use beyond esoteric dogma.

Allow me to try:

Feature Flags inherently introduce at least one branch into your codebase.

Every branch in your codebase creates a brand new state your code can run through.

The number of branches introduced by Feature Flags likely does not scale linearly, because there is a good chance they will become nested, especially as more are added.

Start with even an example of one feature flag nested inside another. That creates four possible program states. Four is not unreasonable, you can clearly define what state the program should be in for all four states.

Now scale that to a hundred feature flags, some nested, some not.

It becomes impossible to know what any particular program state should be past the most common configurations. If you can't point to a single interface in a program and tell me all of the possible states of it, your program is going to be brittle as hell. It will become a QA nightmare.

This is why Feature Flags should be used for temporary development efforts or A/B testing, and removed.

Otherwise you're going to have a debugging nightmare on your hands eventually.

Edit: Note that this is different from normal runtime configurations because normally runtime configurations don't have a mix of in-dev options and other temporary flags. Also, they aren't usually set up to arbitrarily add new options whenever it is convenient for a developer.

Sorry, not buying it.

Branches are difficult to reason about? Yes, I agree.

Are branches necessary to make the product behave in a different way in some circumstances? Most of the time.

Do those circumstances require a branch? Unless you’re super confident about some part of code, yes? But why would you be?

Runtime configuration is not about making QA easy. It’s introduced because QA has been hell already so you can control rollout of code which you know wasn’t properly QA’d - or it was but turns out the thing you built isn’t the thing users want and the release cycle is too long to deploy a revert.

I’d say ‘branches are bad but alternatives are worse’.

There is nothing worse than code with dozens to hundreds of possible configuration states that you must test for every new feature.

If your QA was bad before, you've made it worse.

"I can toggle it off without pushing a new release" is a terrible bandaid for the problem.

The fundamental diff between feature flags and config is the former is meant to be a soft deploy of code where everyone is expected to eventually be on the new code. Thus it should have a timer built in where it stops, and you should consider all new customers launching with it on.

As for why: if you don't deprecate the feature flag in some time span, you're permanently carrying both code paths. With ongoing associated dev and qa resources and costs against your complexity budget.

Permanent costs should only be undertaken after careful consideration, and should be outside the scope of a single dev deciding to undertake them. Whereas flags should be cheap to add to enable dev to get stuff into prod faster while retaining safety.

Permanently making something a config choice should be done after heavier deliberation because of the aforementioned costs, and you often want different tools to manage it. Including something heavier duty than a single checkbox/button in your internal CS admin tooling. These are often tied into contracts or legal needs, and in many cases salesforce should be the source of truth for them. Or whatever CPQ system you're using.

I feel like this is a solvable problem: 1) make feature flags be configured to have an expiration date. If over the expiration date, auto-generate a task to clean up your FF 2) If you want to be extra fancy, set up a codemod to automatically clean up the FF once it's expired

I don't see the problem with developers using flags for configuration as a stopgap until there's a better solution available.

> automatically clean up the FF once it's expired

Um what? How could that ever work. It's like you are trying to find new exciting ways to break prod.

It can be done by opening a PR, I haven't tried it yet, but I'm curious to try out https://github.com/uber/piranha or maybe hear some experiences if someone has used it
Which would be instantly rejected because the flag is still being used.
AFAIK, it'd only open a PR if the flag is fully enabled and has some heuristics to determine when it's safe to remove. Honestly, I haven't tested it but I'm curious to know if someone had either good or bad experiences.

If all the PRs are instantly rejected, that would be a bad sign, but I couldn't find someone who effectively used it. I mean, it's been around for a while but it didn't spread out, so that already gives me some hint

If the cleanup only happens if the flag is not used, then the "expiration date" is basically meaningless. You can either delete it or you can't. Who cares if it's expired or not.
Sounds like "other dev" found some business case they could unblock with existing system, and you thought the business was better off not solving that, or finding a more expensive solution.

Curious how you plan to justify cost to "fix it" to management. If it ain't broke...

I think it's better to admit they actually are config, just a different kind of config that comes with an expiration date.

Accepting reality in this way means you'll design a config management system that lets you add feature flags with a required expiration date, and then notifies you when they're still in the system after the deadline.

I agreed. My perspective is that there are two kinds of feature flags: temporary and permanent.

Temporary ones can be used to power experiments or just help you get to GA and then can be removed.

Permanent ones can be configs that serve multiple variations (e.g. values for rate limits), but they can also be simple booleans that manage long term entitlements for customers (like pricing tiers, regional product settings, etc.)

We did the same. We were early adopters of unleash and wrangled it to also host long term application configuration and even rule based application config.

The architecture of unleash made it so simple to do in unleash vs having to evaluate, configure, and deploy a separate app config solution.

Victim of your own success. As others were saying, when it works for short-lived its easy/no effort to use it for long-lived configurations.
Thanks for sharing. I have seen systems grow in to thousands of flags, where most developers does not know what a particular flags do anymore.
It’s one of the main reasons to start with something like unleash because they have stale flag warnings built in. Plus, since you already have a UI it’s harder for it to be hijacked.
The solution is not to use feature flags. Or maybe have them expire. Oh, also, discipline the developers who do this.