Hacker News new | ask | show | jobs
by vidarh 947 days ago
> that's not really closer to bnf so maybe it's not what you're looking for

I'm really toying with a variety of things. One thing is the smaller/cleaner parsing of binary formats.

The other thing I'm playing with is seeing how close I get to cleanly expressing the grammars in pure Ruby, without parsing an external format. The two things are pretty orthogonal, and always enjoy looking at new ideas for parsing in general, not just what I can abuse the Ruby parser to handle directly...

Your format is interesting. I read it as the <name: ...> bit serving as a capture? If were to map that to Ruby, I'd probably use the Hash syntax to try to approximate it, so you'd end up with something "{expr: {first: ...}". Incorporating the actions without some extra clutter would be a bit tricky, because the operator overload's won't let you add a block, and so you're stuck with a minimum of ->() {...} or lambda { ... }, but I think representing it reasonably cleanly would be possible (of course this looks like a simple enough format to just parse directly as well).

For a taste of how close to avoiding "host language noise" you can get when embedding the grammars directly in Ruby (I've not settled on the specific operators, but it's a bit limited since you can't change the precedence, I do think the '/' which came from ABNF might have been a poor choice and I might revert to '|' in particular), here's a small fragment of the Toml grammar that will parse as valid Ruby with the right code:

    toml               <= expression & 0*(newline & expression)
    expression         <=  ws & [keyval / table] & ws & [ comment ]
    ws                 <= 0*wschar
    wschar             <= 0x20 / 0x09
    newline            <= 0x0a / (0x0d & 0x0a)
But that so far excludes captures. I'm tempted to default to simply capturing the terms by name.

The trick to making the above valid Ruby that this is defined inside a class that "uses" a set of refinements[1] that lexically overrides methods on a number of core classes only within the grammar definition, and then it instance_eval's the grammar within an object where method_missing returns an object that acts as a reference to named rules, so each rule is basically just a series of chained method calls (hence the annoying '&' - I think it may be viable to get rid of it by keeping track of state in the object whose method_missing gets invoked, but I'm not sure if it'll be robust enough to be worth the slight reduction in visual clutter)

Without the refinement feature we'd have to wrap the integers etc. if we didn't want to monkey-patch the standard classes and break everything. (This kind of DSL is the only really useful case I've found for the refinements feature so far)

[1] https://docs.ruby-lang.org/en/master/syntax/refinements_rdoc...