Hacker News new | ask | show | jobs
by palotasb 1217 days ago
I find the Twelve-Factor App's design to use environment variable for configuration (https://12factor.net/config) unintuitive and perhaps a bad design choice. I believe environment variables are bad for the same reasons global variables are bad.

Pros listed for env vars include not committing them to the repo and not encouraging grouping them together as environments such as dev, staging or prod. I don't agree that these are always good goals, but if they are, the same can be achieved with config files: don't commit them to the repo, generate them on the fly.

The existence and prevalence of .env files is proof that using environment variables as an alternative has failed. Using Twelve-factor as a reference and .env files at the same time is a bit of a contradiction.

Another alternative to consider for both env vars and config files are command line arguments.

3 comments

> I believe environment variables are bad for the same reasons global variables are bad.

Global variables are bad, but environment variables are actually more like dynamic variables: http://www.chriswarbo.net/blog/2021-04-08-env_vars.html

Dynamic scope is useful for things the caller knows better than the implementor, e.g. configuration, credentials, etc.

> Another alternative to consider for both env vars and config files are command line arguments

The two things which distinguish CLI arguments from env vars are:

- Env vars are usually readable from anywhere, whilst CLI args are usually passed around explicitly (more like lexical scope)

- Env vars are inherently key=value pairs, whilst CLI arguments are better suited to checking presence/absence (e.g. 'foo' versus 'foo --force'), parameters which don't need names (e.g. 'foo myFile') and variable-length lists of parameters (e.g. 'foo file1 file2 file3')

Hi Chris! Thanks for the link, it's an enlightening read, I learned about dynamic variable scopes today.

It did make me change my mind partially about "environment variables are bad for the same reasons global variables are bad." I concur that environment variables are more like constants than mutable globals, even in my language of choice, Python. If you only use them at process boundaries, they is fine, I admit using them that way too:

  parser = argparse.ArgumentParser()
  parser.add_argument("--foo", default=os.environ.get("FOO"))
If they are used at a boundary within a process, however:

  def foo_function():
    return foo_implementation(os.environ.get("FOO"))
Then testing foo_function() becomes a problem because os.environ isn't dynamically scoped within the process. Each test case can set os.environ["FOO"], but then the tests have mutable globals now even if the app doesn't. I know three ways to solve this, each with it's pros and cons:

- 1. Treat the script as a black box, only test the script as a whole -- or not at all. How env vars are used internally doesn't matter. Works well for smaller scripts.

- 2. Keep the code as is, test functions individually by setting and resetting the environment variables in each test setup and teardown. Don't run tests in parallel.

- 3. Push all environment variable usage to process boundaries and make all inner functions pure functions that are only affected by their explicit input parameters. If needed, I even make standard in/out/error, logger instances and other similar globals explicit parameters or class members. Requires more boilerplate, works better for more complex projects. Testing any behavior becomes easier.

I prefer to go with option #1 or #, as #2 feels dirty and makes my test cases smell of workarounds. #3 could look such with few details omitted:

  parser = argparse.ArgumentParser()
  parser.add_argument("--foo", default=os.environ.get("FOO"))
  args = parser.parse_args()

  def foo_function(foo_value):
    return foo_implementation(foo_value)

  def main():
    ...
    foo_result = foo_function(foo_value=args.foo)
    ...

  ...
  
To agree with you, it would be great if the ex-globals-turned-parameters I'm passing around during option #3 would be dynamically scoped. Not shown in the example above, but imagine that instead of printing to sys.stderr, functions receive an stderr: io.IOBase parameter or a custom dataclass that contains such a field. The point is to get rid of mutable global state in all cases.

To disagree with you, I think the correct term for "things the caller knows better than the implementor" are parameters. I'm not sure there's a benefit to preferring dynamic scope for parameters when most languages default to lexical scope.

About your last too points I somewhat agree and somewhat still disagree: "CLI args are usually passed around explicitly" -- I think this is a pro, not a con. Further, CLI arguments are strictly more flexible then environment variables, most argument parsing libraries support key-value parsing in addition to boolean flags and lists.

However, regarding your overall point that I understand as: environment variables used at process bounderies behave like dynamically scoped variables and these are fine. I agree, as long as they stay at process boundaries.

> "CLI args are usually passed around explicitly" -- I think this is a pro, not a con.

Sure; I never said it's a con. They have different characteristics, and are both useful in certain situations :)

> I think the correct term for "things the caller knows better than the implementor" are parameters.

True; that's also the name Racket gives to dynamically-scoped variables https://docs.racket-lang.org/guide/parameterize.html

In fact, Racket uses a parameter (dynamically-scoped variable) to store the environment. This is actually slightly annoying, since the parameter is one big hashmap of all the env vars; but I usually want to override them individually. One of my Racket projects actually defines a helper function to override individual env vars makes a copies all the other environment ( made a are contained in a parameterhttps://github.com/Warbo/theory-exploration-benchmarks/blob/...

Global variables are primarily bad when they're mutable. Environment variables are (usually anyway) global constants, which are indispensable.

And generating config files sounds like a pain, probably more complexity than a lot of us really need. Though I don't disagree that it's a little silly to take env files too seriously as a format.

Files are written to disk. In a cloud setup that mean possibly leaking credentials when your disk is re-assigned to another tenant.

Yes, cloud provider are supposed to properly erase hard drive before reassigning. Can you be 100% sure they do though ?

With environment variable in RAM the problem is moot. Committing and/or generating .env in production system is completely missing the point.

If you're worried about that you should be worried about what gets written to memory too. You have little control over where virtual memory ends up actually storing your bits and bytes. Unless you run without swap, but that's just a bad idea overall.
Lots of distributions precisely clean swap on boot or shutdown for security reasons. Also, clean a swap that is relatively small is faster than zeroing a full disk.

Your argument does not validate the use of easily recoverable .env file. Recovering a .env file is easier than recovering virtual memory.

"Files are written to disk" is not strictly true. In the use case where the config contains (hopefully short-lived) credentials, one would pass them in a temporary file that usually only lives in RAM (unless /tmp doesn't use tmpfs or the temporary config file is put somewhere else) and of course doesn't get committed to the repo. (I'm not sure if you meant git commit or filesystem commit.)

I sometimes find secrets to be safer inside config files since so many times the environment variables get dumped into logs – hence all the popular CI/CD products have features to try to scrub such secrets from their logs.

I agree about not using .env files in production, I'd not use it at all.

This is an advantage with sqlite as a config store as well - initial db config file augmented in-memory with secrets, accessible from all major languages, without relying on the vagarities of the filesystem (windows vs Linux tmp mount points) and easy to have multiple switchable configurations depending on environment, test mode (integration tests after deployment etc.) or customer.