Hacker News new | ask | show | jobs
by echelon 502 days ago
> if you're not just listing the nines on your system as a whole

At scale the nines of your feature flagging system become the nines of your company.

We have a massive distributed systems architecture handling billions in daily payment volume, and flags are critical infra.

Teams use flags for different things. Feature rollout, beta test groups, migration/backfill states, or even critical control plane gates. The more central a team's services are as common platform infrastructure, the more important it is that they handle their flags appropriately, as the blast radius of outages can spiral outwards.

Teams have to be able to competently handle their own flags. You can't be sure what downstream teams are doing: if they're being safe, practicing good flag hygiene, failing closed/open, keeping sane defaults up to date, etc.

Mistakes with flags can cause undefined downstream behavior. Sometimes state corruption (eg. with complicated multi-stage migrations) or even thundering herds that take down systems all at once. You hope that teams take measures to prevent this, but you also have to help protect them from themselves.

> slowly-changing, centrally-managed system state at runtime

With flags being so essential, we have to be able to service them with near-perfect uptime. We must be able to handle application / cluster restart and make sure that downstream services come back up with the correct flag states for every app that uses flags. In the case of rolling restarts with a feature flag outage, the entire infrastructure could go hard down if you can't do this robustly. You're never given the luxury of knowing when the need might arise, so you have to engineer for resiliency.

An app can't start serving traffic with the wrong flags, or things could go wrong. So it's a hard critical dependency to make sure you're always available.

Feature flags sit so closely to your overall infrastructure shape that it's really not a great idea to outsource it. When you have traffic routing and service discovery listening to flags, do you really want LaunchDarkly managing that?