Hacker News new | ask | show | jobs
by giaour 1465 days ago
Sometimes you need to support something more flexible than JSON but just can't hand your (non-dev) users all the footguns that come with a Turing complete configuration language. "Key/Value" makes the problem sound simple, but maybe you're writing a rules engine, and the values are expressions that will be evaluated at runtime. It's uncommon, but the need isn't impossible to imagine.

I'm personally of the opinion that using a scripting language is the worst option. Going that route is giving up on most forms of static analysis, since you can't inspect configuration values without executing untrusted code.

2 comments

> Sometimes you need to support something more flexible than JSON but just can't hand your (non-dev) users all the footguns that come with a Turing complete configuration language. "Key/Value" makes the problem sound simple, but maybe you're writing a rules engine, and the values are expressions that will be evaluated at runtime. It's uncommon, but the need isn't impossible to imagine.

> I'm personally of the opinion that using a scripting language is the worst option. Going that route is giving up on most forms of static analysis, since you can't inspect configuration values without executing untrusted code.

Sounds like the original case of the "inner-platform effect". A "custom configuration file" format for a "rules engine" is a scripting language - you will almost certainly make it accidentally Turing complete even if you were trying not to - and it's one that's even less susceptible to analysis than a standard scripting language.

The least-bad solution is an "inner" DSL in your language, IME, with lots of support code to make it as nice as possible (admittedly my experience is mainly in languages that make this easy) - that way you get tooling for free and you can leverage existing static analysis tools. Yes, your users will find ways to shoot themselves in the foot, but they were always going to.

> A "custom configuration file" format for a "rules engine" is a scripting language

I see what you mean, but if a project is defining its own configuration language, it can impose as many restrictions as it wants on what expressions are allowed. There are some great examples of non-Turing complete configuration DSLs out in the world, though the most successful examples seem to be used by application frameworks (protocol buffers, GraphQL schema language, Thrift IDL, Avro IDL, Smithy) and infrastructure-as-code projects (Terraform HCL, Azure Bicep).

At a previous job, there were a couple popular "inner DSL" projects, but one was written from the beginning to eventually evaluate to a JSON document, and the other was rewritten to do so (rather than be interpreted and acted upon iteratively) because TypeScript and Ruby scripts could just embed too much arbitrary complexity to be reasonably analyzed in toto by humans once projects reached a certain size.

> it can impose as many restrictions as it wants on what expressions are allowed

Sure, but it's very easy to accidentally be Turing complete. You added loops? Boom. You added some kind of alias / extract-common-value feature? Boom. Etc.. Of course you can forcibly restrict the implementation (I wrote a Y combinator in one of them and found that the interpreter would detect recursion at runtime and refuse to process it) but at that point you're making your language inconsistent which is even worse.

> protocol buffers, GraphQL schema language, Thrift IDL, Avro IDL, Smithy

Those aren't "rules engine"s though. Once you get to the point of having expressions in your language, you slide down the slippery slope pretty quickly, IME.

> Those aren't "rules engine"s though

Yeah, I don't know why I picked rules engines as an example of things that are hard to configure with JSON. My day job is working on an infrastructure-as-code DSL, and I previously worked on an RPC framework IDL. Both of those projects are replacing an alternative in a traditional serialization language (JSON and XML, respectively) where users spent a lot of their time fighting against JSON or XML gotchas.

The IaC language supports expressions but not named expressions, so we've been able to avoid Turing completeness so far.

That may not be enough; you wouldn't be the first supposedly-non-Turing-complete language where it's actually possible to express a Y combinator and use that to embed arbitrary computations.
Expressions that will be evaluated at runtime are just strings, no?

  {"rules": [
    "x > 1",
    "x % 2 = 1"
  ]}
So you will have an "expression parser" to parser your rules, validate them, and execute them. But at least you do not have the edge cases of everything else.
At this point you're already writing your own parser to parse the expressions from the strings. Why not cut out the json completely and make it way more readable?

  rules = [
    x + 1,
    x % 2 == 1
  ]
Those are simple expressions (no multiline blocks, no named symbols, no expressions that call other expressions), and you're still forced to treat them as opaque strings when you use a serialization language for configuration.

That may be the best approach for a given project! But if you're using configuration to define an API contract, cloud infrastructure, or a query, you might be better off using a purpose-built DSL like protocol buffers, HCL, or SQL, respectively. That approach can let you define a greater level of expressivity than is allowed in a serialization language like JSON or XML without letting config authors write scripts of arbitrary computational complexity (like they would be able to with config in a scripting language).

This sort of hack with control logic was done in XML with Ant. It was a bad idea then and it still is.