Hacker News new | ask | show | jobs
by hitchdev 466 days ago
>The issue here is that schema validation is expressed in Python. The author contradicts himself when he argues, on the one hand, that Python shouldn't be used for configuration because it's too powerful: https://hitchdev.com/strictyaml/why-not/turing-complete-code... and on the other hand, that Python is really powerful for building schemas: https://hitchdev.com/strictyaml/why-not/json-schema/ .

Disclosure: I'm the author.

The difference is because full validation/parsing is a task that can rarely be always fully accomplished with JUST a non-turing complete schema. Every time I use JSON schema I have to add additional validation on top written in turing complete code.

This happened to me literally just an hour ago when I wanted to put a DSL in a field in a config file. json-schema (the "config" schema) doesn't let me write code to validate this and reject it. It's a string or it's not. With StrictYAML schemas written in code it's pretty straightforward to create a parser/validator that rejects invalid DSL with a meaningful error.

You might argue that "these rules bolted on top aren't part of the schema" or "this is validation that you can do after the json schema validates" but there is benefit to combining them - namely, code coherence and validation error consistency.

(there are also down sides - namely that json schema can be used in multiple languages. strictness comes at the expense of reusability).

In practice almost every schema I build I want to have stricter validation rules that are not enforceable with something like json-schema alone.

These are both instances of the law of least power. There are plenty of languages which are too powerful for the task at hand and plenty which are not powerful enough and people hack around and even rage against both. There are other "goldilocks" languages that are just right for the task at hand.

1 comments

> This happened to me literally just an hour ago when I wanted to put a DSL in a field in a config file. json-schema (the "config" schema) doesn't let me write code to validate this and reject it.

You can embed DSLs in CUE. It's a bit unwieldy because you have to essentially reproduce the DSL grammar in CUE, and it may not be performant, but yeah, it's doable. Can you provide more details?

> You might argue that "these rules bolted on top aren't part of the schema" or "this is validation that you can do after the json schema validates" but there is benefit to combining them - namely, code coherence and validation error consistency.

I would argue that it's a slippery slope. Consider v1 where an enum is statically defined as Employee or Manager. Then in v2 we add VP and CEO. Then in v3 actually the list of permitted titles needs to be fetched from a database populated by HR. Is it still correct to put this in configuration validation? What if the person writing the configuration doesn't have permissions to read from HR's database? So nothing should work?

>You can embed DSLs in CUE

CUE lets you embed functions too it looks like it's almost a programming language itself.

The closer a configuration language gets to a programming language the less of a reason I see for it to exist.

>I would argue that it's a slippery slope. Consider v1 where an enum is statically defined as Employee or Manager. Then in v2 we add VP and CEO. Then in v3 actually the list of permitted titles needs to be fetched from a database populated by HR. Is it still correct to put this in configuration validation?

No, coupling to a database would be bad design IMO, but grabbing those enums from other config files in the same folder that are parsed earlier I have done a lot.

I've also used libraries that provide lists of timezones and country codes as enums and plugged them in to the parser so you couldnt invent your own country code.

And Ive written validators that reference other bits of the config (e.g. the list of permitted titles is in another part of the config).

All of these things I would argue are good and useful and not worth sacrificing in exchange for preventing possible misuse (like coupling parsers to a DB).

I actually wrote this parser in the first place because I wanted to create a good metalanguage for tersely defining strongly typed executable specifications in YAML (i.e. Gherkin done right). Tons of stuff I wanted to strictly validate wouldnt have been possible with config-based schema validation and with YAML's weak, implicit typing it was a fucking mess.