Hacker News new | ask | show | jobs
by ianbicking 1620 days ago
When you describe it that way it reminds me of SAX [1] – I always hated SAX, but eventually realized it was kind of a tokenizer that left it up to the developer to figure out how to turn that into a compiler, though in this case compiling XML input into some internal data structure or action.

[1] https://en.wikipedia.org/wiki/Simple_API_for_XML

2 comments

Weirdly, but sometimes ideas churn around HN, I just replied to another thread about this. See my reply on this thread https://news.ycombinator.com/item?id=29917060

At BitFlash, one of the things we had to build was a SAX parser for the SVG DOM. I used the DSL of the W3C spec to compile the SAX parser.

One of the more strange things I did was in some contexts (think old school Blackberry) we had a server to pre-parse the SVG so we "knew" it was clean (I'm still sceptical 20 years later this is ever a good strategy, but take it as a given). Because we knew the SVG was clean, there was a faster way to parse the XML than reading the tokens.

I used my magic Perl script transformer to compute the lowest entropy decision tree to identify a token with the fewest comparisons, which was surprisingly way more efficient than a trie.

The real annoyance with SAX is its push model. Similar pull APIs (e.g. XmlReader in .NET: https://docs.microsoft.com/en-us/dotnet/api/system.xml.xmlre...) let you parse XML in much the same way as a recursive-descent parser parses text, with the reader working much like a lexer.