Hacker News new | ask | show | jobs
by duairc 2655 days ago
> Writing my own docx parser? Sure, that will be one mythical man century of work.

I'm not sure that's actually a great example. I had a project a while ago where I needed to extract certain information from Word and Excel files, and it was less work to just write my own parser (it's just XML in a ZIP file) that got exactly the information I needed than to figure out all the complexity of using a full-blown docx/xlsx parser. It ended up being 100 lines of Haskell, and half of that was imports.

https://gist.githubusercontent.com/duairc/db3e99a7808668e84e...

Edit: The docx part of it is only 10 lines of code.

2 comments

There's a slight difference between extracting a few tags from an xml file and building an manipulable ast of it.
There is, but if your problem requires just the former, it's faster and better to build it yourself than to pull in a heavy third-party dependency (of which you'll use 1% anyway).
Yes but if it requires the latter you end up with an uncontrollable mess of regular expressions which can accidentally parse the language needed to summon the great elder ones.

The media wiki parser is a perfect example of what can go wrong with simple solutions.

That's a pretty neat trick.