Hacker News new | ask | show | jobs
by legulere 4223 days ago
The YAML format mentioned in the slides actually has the least syntactic clutter, however the parsing is quite difficult
2 comments

Well, that's why you use a library.

What's important is that it's easy to read and write. And that's yet another reason to standardize you language, so that you don't need to trade legibility over parseability, because you'll only need to write the parser once.

The only problem I see with that is that some programs use turing-complete configuration languages, while for most of them that'd be a bug, not a feature. One just can not standardize all *nix tools in one configuration language... maybe two, but not one.

Because it simply is too complex for a format that emphasizes easy readability. Some easy-to-grasp subset of YAML might be a good choice.
I came up with such a format (called BML) and for the same reasons. Here's an example that shows the entirety of the syntax:

    server
      path: /core/www/
      host: example.com
      port: 80
      service: true
      proxy
        host: proxy.example.com
        port: 8080
        authentication: plain
      description
        :Primary web-facing server
        :Provides commerce-related functionality
    
    server
      ...
      proxy host="proxy.example.com" port="8080"
        authentication: plain
Everything is a node, which can have a data value and zero or more child nodes. Nesting is determined by counting the number of indentions. It uses a counter so that if you want indents to be one tab, or two spaces, or four spaces, it will still work. (Being too rigid here makes the syntax very unfun for humans to write.)

Once parsed, element-style (first proxy node) or attribute-style (second proxy node) become identical nodes and are treated the same (but with a flag in case you want to write out a modified file): they are fully interchangeable, so no attribute vs element debates, just use what works best for readability. (This really is critical. Some document types would be ten times as long without attribute-style nodes.)

The syntax has no entities. foo="data" can capture any data that doesn't need quotes or newlines. foo: can capture any data that doesn't need a newline. foo\n:data\n:data can capture any data that doesn't need binary. foo\n:base64\n:base64 can capture absolutely anything.

Node names must be [A-Za-z0-9-.]{1,}, and are case-sensitive. The same node name can appear multiple times at any level, even at the root level. Ambiguity is resolved by the order of appearance for each node.

The data values of nodes are completely unparsed by the markup. The syntax knows no difference between strings, integers, floats, booleans, binary data, arrays, etc. The application parses the text however it wants. The library adds some convenience functions (.text() to strip surrounding whitespace, .integer() to get a number, .boolean() to decode true/yes/on vs false/no/off, etc.)

File format is mandatory UTF-8 (no BOM.) Preferred line feeds are '\n', but '\r\n' is also permitted because Microsoft.

The implementation in C is about 8KB. Since everything has a marker, allocations are not necessary for the node names/values (but you will need to allocate the tree structure, obviously.) There's an accompanying path query syntax (ala XPath) that's another 6KB of code or so.

(All the edge cases are well-defined (mismatched indentation, mixing multi-line and child nodes, etc), but the post is getting a bit long.)

This was the best I could do at minimalism. Removing any functionality it has results in ruling out many use cases.

Haven't seen that before; indeed it is similar. I quite like that.

The limitation I see is, how do you store data values with spaces in them? If it allows quoted values, then how do you store values with quotes in them? And how do you store line feeds? Not having those rules out a lot of use cases.

hmm, I can't remember about spaces and you would need them for txt records. I cand find anything in the documentation, i'd have to do some tests
I really like this. Wish it would be widely adopted in place of YAML.

Have you written a spec and promoted it at all?

Thanks! I did write up a spec (currently not online after a host move), but I'm very bad at expressing grammars with eg EBNF, so it was a fairly verbose read.

I haven't really promoted it, I don't know how to promote my work tactfully (I tried promoting a few things on r/programming and the mods buried them right off the bat.)

I'd be very happy for any help in this regard, as well as for any suggestions on simplifying the parsing (the edge cases really are annoying to deal with.)