| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chriswarbo 3326 days ago

There are a few nice XML processing utilities. I tend to use xmlstarlet and/or xidel. This lets me use XPath, jQuery-style selectors, etc.

I agree that jq is really nice though. In particular, I still find JSON nicer than XML in the small-scale (e.g. scripts for transforming ATOM feeds) because:

- No DTDs means no unexpected network access or I/O failures during parsing

- No namespaces means names are WYSIWYG (no implicit prefixes which may/may not be needed, depending on the document)

- All text is in strings, rather than 'in between' elements

- No redundant element/attribute distinction

Even with tooling, these annoyances with XML leak through. As an example, xmlstarlet can find the authors in an ATOM file using an XPath query like '//author'; except if the document contains a default namespace, in which case it'll return no results since that XPath isn't namespaced.

This sort of silently-failing, document-dependent behaviour is really frustrating; requiring two branches (one for documents with a default-namespace, one for documents without) and text-based bash hackery to look for and dig out any default namespace prior to calling xmlstarlet :(

http://xmlstar.sourceforge.net

http://www.videlibri.de/xidel.html