Hacker News new | ask | show | jobs
by kstenerud 823 days ago
I built a grammar to tackle these sorts of problems when I had trouble writing formal grammar notations for my binary data format. It's even got a syntax highlighter.

https://dogma-lang.org/

So far it's been able to describe 90% of what's out there. Some examples:

- 802.3 layer 2 Ethernet: https://github.com/kstenerud/dogma/blob/master/v1/examples/8...

- Microsoft ICO format: https://github.com/kstenerud/dogma/blob/master/v1/examples/i...

- Android Dex v39: https://github.com/kstenerud/dogma/blob/master/v1/examples/d...

- IPv4: https://github.com/kstenerud/dogma/blob/master/v1/examples/i...

- DNS query: https://github.com/kstenerud/dogma/blob/master/v1/examples/d...

- Microsoft Minidump: https://github.com/kstenerud/dogma/blob/master/v1/examples/m...

- Concise Binary Encoding: https://github.com/kstenerud/concise-encoding/blob/master/cb...

- Concise Text Encoding: https://github.com/kstenerud/concise-encoding/blob/master/ct...

1 comments

The main feature of interval parsing appears to be that it can jump over content such that a later part in a file does not depend on knowing everything that comes before it. Has Dogma similar expressiveness?
Yes, the `offset` function does this by specifying a bit-offset to branch to. For example the ICO `dir_entry`, which is a directory list of icon resources in the file. https://github.com/kstenerud/dogma/blob/master/v1/examples/i... - It's using image_offset*8 because everything in an ICO file is a byte-offset (8 bits)

It's also needed to parse Minidump. For example https://github.com/kstenerud/dogma/blob/master/v1/examples/m... and https://github.com/kstenerud/dogma/blob/master/v1/examples/m...