Hacker News new | ask | show | jobs
by lmm 1477 days ago
> Sometimes you need to support something more flexible than JSON but just can't hand your (non-dev) users all the footguns that come with a Turing complete configuration language. "Key/Value" makes the problem sound simple, but maybe you're writing a rules engine, and the values are expressions that will be evaluated at runtime. It's uncommon, but the need isn't impossible to imagine.

> I'm personally of the opinion that using a scripting language is the worst option. Going that route is giving up on most forms of static analysis, since you can't inspect configuration values without executing untrusted code.

Sounds like the original case of the "inner-platform effect". A "custom configuration file" format for a "rules engine" is a scripting language - you will almost certainly make it accidentally Turing complete even if you were trying not to - and it's one that's even less susceptible to analysis than a standard scripting language.

The least-bad solution is an "inner" DSL in your language, IME, with lots of support code to make it as nice as possible (admittedly my experience is mainly in languages that make this easy) - that way you get tooling for free and you can leverage existing static analysis tools. Yes, your users will find ways to shoot themselves in the foot, but they were always going to.

1 comments

> A "custom configuration file" format for a "rules engine" is a scripting language

I see what you mean, but if a project is defining its own configuration language, it can impose as many restrictions as it wants on what expressions are allowed. There are some great examples of non-Turing complete configuration DSLs out in the world, though the most successful examples seem to be used by application frameworks (protocol buffers, GraphQL schema language, Thrift IDL, Avro IDL, Smithy) and infrastructure-as-code projects (Terraform HCL, Azure Bicep).

At a previous job, there were a couple popular "inner DSL" projects, but one was written from the beginning to eventually evaluate to a JSON document, and the other was rewritten to do so (rather than be interpreted and acted upon iteratively) because TypeScript and Ruby scripts could just embed too much arbitrary complexity to be reasonably analyzed in toto by humans once projects reached a certain size.

> it can impose as many restrictions as it wants on what expressions are allowed

Sure, but it's very easy to accidentally be Turing complete. You added loops? Boom. You added some kind of alias / extract-common-value feature? Boom. Etc.. Of course you can forcibly restrict the implementation (I wrote a Y combinator in one of them and found that the interpreter would detect recursion at runtime and refuse to process it) but at that point you're making your language inconsistent which is even worse.

> protocol buffers, GraphQL schema language, Thrift IDL, Avro IDL, Smithy

Those aren't "rules engine"s though. Once you get to the point of having expressions in your language, you slide down the slippery slope pretty quickly, IME.

> Those aren't "rules engine"s though

Yeah, I don't know why I picked rules engines as an example of things that are hard to configure with JSON. My day job is working on an infrastructure-as-code DSL, and I previously worked on an RPC framework IDL. Both of those projects are replacing an alternative in a traditional serialization language (JSON and XML, respectively) where users spent a lot of their time fighting against JSON or XML gotchas.

The IaC language supports expressions but not named expressions, so we've been able to avoid Turing completeness so far.

That may not be enough; you wouldn't be the first supposedly-non-Turing-complete language where it's actually possible to express a Y combinator and use that to embed arbitrary computations.