Hacker News new | ask | show | jobs
by snapcaster 721 days ago
If it's something controlled by the PR why is it even a feature flag? Seems like you lose a huge chunk of the benefit and might as well just change the code at that point. This to me seems like the wrong place to be controlling them, i certainly don't want to have to make a PR and merge it when a site issue is happening.

Open to missing something though, curious what others experience has been

9 comments

It's good to have the current state of all the flags in source control, just like you do for things like infrastructure-as-code.

The distinction is that you have a different release process, or build a different artifact, from your main codebase. The codebase you are controlling with flags doesn't change when your flags do. This can be done with separate repos if you want one build per repo, but it doesn't have to be.

If you split feature flags into a different repository, then you're losing the benefit of having everything in a single repository, of having consistency and avoiding situations where your codebase refers to feature flags that don't exist, old feature flags that are no longer checked by the codebase, etc.

At this point, your production infrastructure is no longer solely one stateless server + one database, but two databases: your RDBMS and your GitOps repo tracking feature flags. Do you really get enough value from the second GitOps database compared to putting your feature flags in your main RDBMS?

With one repo you still have versions of this problem – have the flags been deployed as config to the flag system before the server build, has the server build removing usage rolled out fully by the time you remove the config. Using an RDBMS is an option, but makes scaling harder, you still need an audit trail, review processes, etc, so you eventually end up building a source control system on top of it (assuming you hit all the sharp corners and put time into solving them).

If you're feature flagging client code (i.e. somewhere you don't control rollouts, like mobile and web apps) that adds another layer of complexity.

While it's nice to have a simple system, having built one from scratch and used very mature feature flagging systems, my experience is that production systems hit almost all the edge cases quite quickly and flagging/experimentation systems are forced to evolve quite quickly to actually account for these issues.

Multi-repo or not isn't really an issue. My previous company had flag config in a separate repo, my current company has a monorepo, it doesn't really make a difference.

> Using an RDBMS is an option, but makes scaling harder

Modern Postgres scales vertically quite well on modern hardware.

> you still need an audit trail, review processes, etc,

But you need this anyway for the RDBMS in your architecture. You need an audit trail for when engineers need to get into the production database, and to show that their changes passed review, etc. My point is, if you anyway need to build this for your RDBMS, then you can build on top of that for your feature flag system if you throw that into your RDBMS as well.

> my experience is that production systems hit almost all the edge cases quite quickly and flagging/experimentation systems are forced to evolve quite quickly to actually account for these issues.

I think that's more an argument to use a commercial feature flag platform (like LaunchDarkly) instead of a FOSS option. A commerical platform is anyway what I would prefer to recommend! But, with the context of "choose a FOSS option", it seems to me like building on top of RDBMS, rather than GitOps, makes more sense.

Many services won't need an RDBMS, and for those that do there's a difference between the control plane of an application (development and releases), and the data plane (users using the service).

This is a complex and nuanced topic, but on my previous team of ~6 where we built a custom solution, we decided against using an RDBMS for multiple reasons, and on my current team where we use the same flagging system across 15 or so >1m requests per second services, there's no way it would work for us. If it works for your use case, that's great! But my advice for anyone else reading would be to put a lot of effort into considering the options as it's hard to change later and has significant impact on how the flagging system is used.

As for whether to use a commercial platform... my preference is probably to build my own with what I need in a system that I can modify as needed, or a commercial platform if there's one ready to go at a good price with the right feature set. I probably wouldn't use an existing open source option here unless I was forking it and treating it as my own from then on, as I find these things need flexibility and customisation. I've yet to see a great open source option.

The PR is optional, but governance often requires documented peer approval for all production changes.

Flag changes can be pushed directly to the main branch with the correct repo permissions. When using the GitHub UI this involves just a little bit of typing and a few clicks.

>might as well just change the code at that point

If changing the code, running tests, building, and deploying is quicker and less risky then yes that makes more sense.

This seems like a reach. A nice benefit of having these server-owned configuration flags with a slick UI (like launchdarkly) is that they can be modified by people on the fly, and by people who may not even be engineers (like product managers). I imagine, that if the ask is that they instead get GitHub permissions, make a PR, wait for a review, etc, then perhaps you are not competing with launchdarkly. Though, having Git controlled server-owned configuration is still nice regardless.
Depends on the feature. In 99% of cases, I'd prefer an engineer to launch it and have the change tracked by source control
The sole reason I like feature flags is that I can quickly toggle off a change if it causes a problem. I'd hate to need to find someone to sign off on restoring service. Anything more than 1 or 2 clicks is just adding precious seconds to an outage. I've worked at places that gave broad implicit approval to developers to toggle away as needed. It worked well.
> I'd hate to need to find someone to sign off on restoring service.

In these shops, this gets handled via paging on-call engineers. The on-call is sometimes given more latitude if their actions are auditable.

Imagine being called out of bed to turn off a feature flag.

This is nonsense.

I'd agree, it seems like the entirely wrong place to put a feature flag. Personally I'd go for either configuration file or database and then have a process for updating the feature flags.

The best implementation I've seen was in a Java project. Features where enable or disabled by either the properties file or the database. If a flag was set in the database, then that took precedence. New features would always be rolled out disabled in the properties file. Then in a controlled window the new features would be enabled for a few minutes and logs would be examined. If everything looked good the feature would then be enabled again. After a few days or weeks, the properties file would be updated to have the feature enabled by default and the flag in the database deleted in a later task.

If not a PR review, how would you propose people should build consensus around:

> I'm about to do this thing to mitigate the issue, does it look like the right thing?

It doesn't need to be a code change, can just be a flags change, but if it's a change at all then why not pin it to a commit so that rolling it back is easy and so that the commit sha can be an indicator of which flags are where.

Agreed that you lose the benefit of "instant updates" if it's always controlled by a PR.

But Git-style version control with history, diffs, branches and pull requests are pretty useful for feature flags and other "app configuration".

Version history and diffs are great for knowing what flag logic changed when + debugging what broke prod.

Branches let you test and preview flag logic changes in your own isolated branch (which you can point the SDK at) — this is a cleaner approach to having a few separate "environments" like development, staging, production which can drift from each other.

Branches are also great for refactoring the schema / structure of all your flags, e.g. deleting a bunch of flags in one go.

Pull requests and approvals are great for when you're making changes to sensitive flags. E.g. you can lock down specific flags.

Pull requests are also great for onboarding nontechnical team members like PMs or sales reps so they can safely make flag changes themselves but require approval from an engineer (at least while they learn to use the system). Empowering nontechnical people is also why a UI is important.

Branching and pull requests are also a great way to prevent conflicts / overwriting other team members flag changes.

So Git-style features are pretty useful, but you also want the UI and you only want to enforce pull requests for specific flags or team members — this is what we built at Hypertune.

You can get the same comfort of fast updates and history with a feature flag system developed using event sourcing, without having the overhead of git
I think the idea is that it's a separate flags repo?
Correct: the flags live in a separate repo.
That’s really an 80/20 situation. The flag repo fixes several large problems around visibility, coordination and state tracking of flag status, but introduces more friction into the system.

I’m not sure I have a comprehensive solution. I just know which ones I hate more than which others. The repo is the least obnoxious of the options.

Can you say more? I think i'm missing the point, in my mind that would make it even worse because you have all the problems I mentioned + merge conflicts being a possibility
Well I don't think you are doing active development on your flags repo. I think it's just using Git as a database for the what are your current feature flags.
One repo is actual flags control-like-database.

So instead of controlling flags from a website, you get the benefits of git merge, PR's, reviews, documentation etc without having to rebuild it.

I like the concept since it brings accountability. But it's just a need that larger orgs have, but by that point have likely internally built a flag system and so transitioning is difficult.

We do this at my company and another huge benefit is the ability to test config at a particular point in time. You can even git bisect to find which flag flip caused a regression.
So you just flip feature flags in prod all willy nilly? No process?
There are more options than store in git, or flip willy nilly.

For systems of sufficient scale, it's fairly standard to keep flag changes outside of git so that they can be flipped without a pr. That way the flag change UI can apply other validation steps before any change is attempted such as ensuring valid enrolment ranges (no accidental overlap, and no accidental rollouts to 100% instead of 10%), and the associated rollout analytics can be shown alongside the changes.

You can also override things in emergencies more easily, which is the parent's point.

But it’s also common to stick those flags back under some sort of audit system so we know why and how weird states got set. The simplest way is a separate repo with simpler rules for pushing to master.

Though you could also create your own audit system (just make sure it functions even when the entire site is down)

This is all decided by business needs, not by engineer preference. If devs aren’t pulling the levers it is good to expose them in an accessible interface, not plaintext.
Let’s not pretend like these “business needs” don’t often boil down to a couple of tropes and that the story doesn’t change after a couple of post mortems are turned in for perfectly predictable failure modes.
Chatbots are great for this too. You get flippiness, but you also get auditing implicitly from the chat log.
You're missing the parent's point. It's not about lacking a change control process, but about having the ability to instantly change the state of a flag when necessary. Moreover, these two can coexist effectively.
With Continuous Delivery you should be able to roll back the deployment. If the last change is the only one you need to revert. Of course that stops working when you share the flag system across teams with separate milestones. Your guys are gonna want to flip toggles while my guys are thinking about flipping others.
It could probably be in a separate repository with access controls and/or deployment lifecycle.

In particular, a change to the yaml for feature flags could bypass most of your build and test pipeline, and changes could be deployed more quickly.

OTOH, you still need to figure out a deployment strategy for it.

Two things,

1. The time to delivery is potentially much much shorter.

2. There's a built-in rules engine for targeting. You could integrate this in! But it feels nice having it separate.