Hacker News new | ask | show | jobs
by danpalmer 1104 days ago
Something I love about the experiment tooling I've used at a few places now (Thread, Google) is the fact that state has typically been stored in source control. i.e. not just the usage of flags in code, but the rollout/experiment definitions, the state of how much traffic is allocated to each branch, eligibility requirements, etc. This makes it easy to see what the current state is (without going to a UI), and also makes it much easier to build tooling around that state, as the API data sync problem disappears.

Looking at DevCycle, it seems you've not taken this approach, is that right? Scanning your docs it seems there's an API to update state, but that fundamentally it's kept in your database, not in code. In my experience this isn't the best dev experience, so as dev experience is your USP, I'm interested in why you think this is a better experience, what benefits it can bring over a "git-ops" style, or what I've missed in your docs.

3 comments

Feature flags in source control seems to be missing the point.

Separating feature flags and source let’s you decouple your code releases from your feature releases. Requiring a deploy of new to code to enable or disable a feature seems to negate almost all of the benefit of using feature flags to begin with.

They don't necessarily have to rolled out as part of the same release process. You can still decouple them without giving up them being version controlled which to me is also preferable over just toggling things through a gui.

Having a release process for flags also allows you to run integration tests with those flags, canary alerting and automated rollbacks.

Yes this is how I view it. Just because it's in code doesn't mean it's the same codebase, same deployment process, same servers, or anything else. Code is the source of truth and log of how that truth changed over time.
Really interesting take on your experience with experimentation / feature flags and wanting it to be more stateful in code. I assume that follows the more Terraform-like infrastructure as code approach, which certainly makes sense to rollout infrastructure changes. We have a V1 of a Terraform provider where we hope to enable more control through terraform to manage infrastructure changes directly, but you are correct that has yet to be our core focus. Another way we hope to expose that state outside of our UI is building out a great CLI, which we are actively working on.

Our primary usecase has been working with product development teams who have adopted Feature Flags as part of their workflows, using DevCycle to help them deploy features faster / safer to their end users, even deploying continuously as we do. Generally, customers integrate DevCycle into their Websites / Mobile Apps + API servers to control those feature deployments. We have had a couple of customers use us at the infrastructure layer. One of the best use cases I've seen is in a proxy service controlling the rollout of a new infrastructure stack. I'd love to dig into this use-case deeper and see how we can better support it in the future, "Ops" flags are one of the flag types we are looking to support.

I would say that for customers looking to disconnect the deployment of code from the release of features, our approach where you can "release" features at any time has many advantages over "git-ops" style configurations. But certainly, for Ops use-cases where you are controlling infrastructure changes, we are believers in "git-ops" and use it ourselves. The challenge comes with how to connect those two deployment styles effectively.

Thanks for the detailed reply, although I feel like I may have miscommunicated the aim here. I'm not really thinking about what is being launched, but how, and where the truth lives.

A CLI is certainly a nice feature, but brings decisions that need to be made: who runs it, where do they run it, when, how do you know what was run, how do you deploy the CLI, and so on. The same can be said for, say, running your test suite – and the solution there is to have CI do it for you. Sure you can run, but the run that matters is the one when you merge your branch and that's done in a controlled environment typically defined in code.

Rolling out features/flags is the same, and I think if the state of all the flags, features, traffic allocations, targeting, and so on, is all defined in code, then all those questions you get with a CLI go away. Who runs it? An automated process. Where do they run it? In a controlled environment, not on a dev machine. How do you know what was run? It's all there in code. How do you deploy the CLI? You don't need to.

> for customers looking to disconnect the deployment of code from the release of features, our approach where you can "release" features at any time has many advantages over "git-ops" style configurations

I'm interested as to why you say this has many advantages, because I don't see why a git based workflow couldn't run this. You could for example have DevCycle subscribe to the notifications for the git repo and update its internal state any time new changes are made to the git repo. That would be preferable over a UI or CLI because the whole state is there in a machine readable format ready for tooling to use.

You can use a git-based workflow for feature flags, that's likely how most teams will start using flags in their code with environment variables and infrastructure state. However, most deployment pipelines in the wild are very slow and owned by engineering, limiting the value of doing git-based state for your Feature Flags.

I've seen that disconnecting the deployment of code from releasing of features can be transformative in various ways that fit better with an API-based model for Feature Flags:

- Many flags / remote config changes in production environments are made by non-developer members of a team.

- Coordinating releases across multiple platforms that have different deployment cycles. For example, deploying a new feature to Mobile + Web + APIs simultaneously.

- Enabling / disabling flags in real-time across your stack to respond to incidents.

- Support / Sales teams using Flags to gate features for customers.

- Remote config to populate data for UIs, to act as a CDN for content.

- Rolling out infrastructure changes, being able to roll back changes instantly without another deployment.

(And for context: our Feature Flag decisioning is done locally for our local bucketing server SDKs)

I put this in another comment, but disconnecting of code and feature release can still be achieved while storing feature flags as code. You need to build out a seperate pipeline and release process. With this you can still achieve all the advantages that you listed out.

I think the key difference is if you consider feature flags to be configuration and believe in configuration as code as a best practice.

I think this is the right approach in the long term. People having the ability to modify configuration in production in real time is an outage waiting to happen without having an obvious way to rollback to previously known working state.

We certainly see workflows where defining everything as code can be powerful. As we've been chatting about this internally today, we have ideas of how you could define all your features / audiences / variables as code and only deploy changes using a Terraform interface.

I feel like, ultimately, this overly limits the reach of feature flags within an organization; it makes it more difficult for most engineers who are not comfortable with Terraform to deploy their features; it limits the ability of product / marketing / project managers / QA from; scheduling releases, configuring experiments, testing features, modifying audiences, etc.

Modifying configurations certainly can cause production issues, which is why we have built tools like our variable schemas and strong typings for our variables to limit the value changes to only known values. With our CI / CLI / IDE integrations, we make it known what values variables can be set to, when variables are added and removed, and when variables enabled in production are modified by PRs. This is all in addition to existing permissions / change logs / rollback functionality we are improving on. One of our core responsibilities is to make changing Feature Flag values as safe as possible. We know that flags can quickly become tech debt, so we also preach that feature flags should be as easy to remove from your code as they are to add to your code.

A fully featured feature management platform should be with you from your IDE / CLI, Code Review / CI, Deployment Pipelines / Alerting and Monitoring, and integrated with your Project Management and Product Analytics. It should be visible to your team throughout their workflows what features are enabled across all your environments.

I don't think using source control is incompatible with this. You can persist state in source control, and have a separate deployment pipeline that is triggered only for updates to feature flag files. It is also deployed separately; e.g., to a key-value store.
Prefab has something that you could use in more or less this way. The value of the flag can come from different sources and there is a defined order precedence. The base levels are default yaml files, which would be committed in your project. We think of live values from the server as a "potential override". So if you only wanted things in git you could just not use the UI. Details in https://docs.prefab.cloud/docs/explanations/bootstrapping

I'm spike-ing on some similar/adjacent things right now actually if you want to jam. I won't claim that what exists today is very polished for your usage pattern.

How did you represent the features flags in source control?
Just a config file defining the flag, how it was targeted at users or other entities, how much it was rolled out, etc.