Hacker News new | ask | show | jobs
by Weizilla 3832 days ago
Two things which I never understood about using environment variables are how do you version control the changes and how do you manage these variables when you have more than just a handful of them?

In this example, it's only six simple values but what happens when you have 10 or 20? Or you have 10 applications with six values each that need to be deployed to four different environments?

Or what about if there are multiple teams making concurrent changes at the same time? What if some application starts failing due to to a recent variable change and you want to revert back or track down who made the change?

I feel like once your application grows to more than just a few simple values, you end up creating a big file to populate these values and you end up back to using configuration files.

5 comments

Environment variables are just a mechanism, or transport, for getting information from the environment to your process. You don't version or manage them, any more than you'd version or manage command line flags. Change control is the responsibility of whatever system sets the variables, or execs the template that produces the run script, or whatever. You don't actually set env vars on the host.
Using environment variables for everything is wrong too. API keys and other sensitive information should be in environment vars. Non-private information should definitely be in config files.

If you need the flexibility of environment variables for a semi-configurable non-secure variable, use them to overwrite a sensible default.

Can someone explain to me why env vars > filesystem for secrets? They seem equivalent in most ways that actually matter.

In general 12-factor gets my hackles up as it comes across as dictatorial with explaining why. Even when I'm wrong I like to be gently convinced rather than hit over the head with rule book. Can someone point me to an extensive source that clearly justifies each factor? Ideally with an actual debate about each point (as this often surfaces the strongest parts of the case for something)

I have a tremendous amount of experience with the 12 Factor book having worked with at Heroku for 6 years. I am also working on an open source 12 Factor platform called Convox.

One reason the factors are presented as prescriptive because apps that don't do this won't work on Heroku.

Is there a specific factor you'd like to deep dive into?

I'll pick one to start: Environment.

There are many ways you can set and read configuration for an app: env, config files, config tools like chef or puppet, config database like zookeeper or etc. if we are talking about config like a database URL you could also use a service discovery system.

Env represents the simplest contract between your app and whatever platform is running it (the OS, Docker, Heroku, ECS).

If the platform can update env and restart the processes to get the new settings, no other config management is necessary.

It's UNIX, it's simple, and it helps you bootstrap any more specialized config management if you need it (set ZOOKEEPER_URL or CHEF_SERVER_URL).

So ENV feels like a factor to become very prescriptive about.

The biggest debate I can see is if ENV is sufficient to build our micro services on, or if service discovery "magic" is necessary too. I.e Zookeeper, Airbnb SmartStack or Docker Ambassador containers.

For the vast majority of apps, ENV is sufficient.

I personally still build my more complex apps around ENV and at all costs avoid needing to use a service discovery system. The added complexity and operations isn't worth it to me.

I have a strong hunch that service discovery won't become an app development pattern that everyone uses until a managed platform (like Heroku) offers it. Perhaps this is where Docker, Swarm and Tutum is headed.

I don't like env vars for secrets: They tend to be easier to leak out of your process, especially via execing child processes. At least with files you can open them CLOEXEC.

Files on disk have the problem of being persistent, though, and being subject to Unix permissions, instead of the process you're explicitly giving the env variables to.

The solution I work on is to keep files in a non-persistent filesystem that audits access and ensures tight permissions (Keywhiz), though in many cases a tmpfs and auditd will do the same.

Anybody care to comment why this was downvoted?
Because your environment variables can (and should) be defined by a config somewhere, but passing the information via the process environment allows more flexibility than requiring access to a file. Someone said it will elsewhere that env variables are a transport, not the storage of the config information.
You effectively build your configuration file into the thing that knows how to run your container. If you're running Kubernetes, this is either a secret or the replication controller definition file. For docker-compose, this is the `docker-compose.yml` file. Or it's the script that starts your container.

But it's pretty common to put service credentials into a config file, so it's an anti-pattern to version-control them. It's _way_ safer not to, which means you shouldn't be version-controlling the thing that runs your container? This is sort of tricky. We're doing it by volume-mapping a non-version-controlled file for database credentials, and storing the rest of the configuration in the database.

In CF-land the most common pattern I've seen for important keys is a "secrets" repository which is merged with the base config at push time.
With Ansible you have an encrypted vault file that stores your secrets. Similar principle I guess.
> Two things which I never understood about using environment variables are how do you version control the changes and how do you manage these variables when you have more than just a handful of them?

We're doing this: the env vars are stored as a stage/container/key hierarchy in version-controlled eyaml files (yaml with encryption at the value level, nice for git diffs). At deployment the eyaml gets decrypted by ops or jenkins converted into a container env map (in our case a kubernetes resource controller).

Additionally we tag deployed containers with the config's git hash to have reproducible deployments, which is actually pretty useful. (again we leverage kubernetes labels, but this principle should could be applied to other orchestration tech i guess).

> Two things which I never understood about using environment variables are how do you version control the changes and how do you manage these variables when you have more than just a handful of them?

If you're using Cloud Foundry, you put them in your manifest.yml and check that into source control. When you do `cf push`, they'll be updated.

Disclaimer: I work for Pivotal, who donate the most engineering to CF.

So how is that different, or better, than putting in the source?
Different repos with different access credentials to create another layer of separation between secrets and source. Defence in depth.
You shouldn't put in your repo sensitiva data.