Hacker News new | ask | show | jobs
by hayleox 3327 days ago
To be honest, I'm really excited about the prospect of JSON based feeds. Right now, there's no easy way to work with Atom/RSS feeds on the command-line (that I know of anyway), which is something I often wish I could do. With a JSON feed, I can just throw the data at jq (https://stedolan.github.io/jq/) and have a bash script hacked together in 10 minutes to do whatever I want with the feed.
4 comments

I give you libxml:

    xmllint --xpath '//element/@attribute'
There's a good chance it's already installed on your mac.
To avoid the hassle of handling xml namespaces (e.g. in an Atom feed...), just do:

    xmllint --xpath '//*[local-name()="element"]/@attribute'
Note: for consistency, namespaces are not needed for attribute names.

http://stackoverflow.com/questions/4402310/how-to-ignore-nam...

There are a few nice XML processing utilities. I tend to use xmlstarlet and/or xidel. This lets me use XPath, jQuery-style selectors, etc.

I agree that jq is really nice though. In particular, I still find JSON nicer than XML in the small-scale (e.g. scripts for transforming ATOM feeds) because:

- No DTDs means no unexpected network access or I/O failures during parsing

- No namespaces means names are WYSIWYG (no implicit prefixes which may/may not be needed, depending on the document)

- All text is in strings, rather than 'in between' elements

- No redundant element/attribute distinction

Even with tooling, these annoyances with XML leak through. As an example, xmlstarlet can find the authors in an ATOM file using an XPath query like '//author'; except if the document contains a default namespace, in which case it'll return no results since that XPath isn't namespaced.

This sort of silently-failing, document-dependent behaviour is really frustrating; requiring two branches (one for documents with a default-namespace, one for documents without) and text-based bash hackery to look for and dig out any default namespace prior to calling xmlstarlet :(

http://xmlstar.sourceforge.net

http://www.videlibri.de/xidel.html

I have an RSS client written in Rust that builds as a command line program.[1] I wrote this in 2015, and it needs to be modernized and made a library crate, but it will build and run with the current Rust environment. It's not that hard to parse XML in Rust. Most of the code volume is error handling.

[1] https://github.com/John-Nagle/rust-rssclient

Surely there's an xml->json converter somewhere.
It's kind of tough to convert XML directly to other formats (including, but not limited to, JSON), because there are a lot of XML features that don't map cleanly onto JSON, such as:

• Text nodes (especially whitespace text nodes)

• Comments

• Attributes vs. child nodes

• Ordering of child nodes

As it happens, XSLT 3.0 and XPath 3.0 both have well documented and stable features for doing exactly this. Roundtripping XML to JSON and back is a solved problem - check it out some time; it may surprise you.
Are you talking about json-to-xml and xml-to-json?

From the XSLT spec [0]:

"Converts an XML tree, whose format corresponds to the XML representation of JSON defined in this specification, into a string conforming to the JSON grammar"

It can't take an arbitrary XML document and turn it into JSON, it can only take XML documents that conform to a specific format.

You can safely round-trip from JSON to XML and back to JSON. That's trivial because JSONs feature set is a subset of XMLs.

What you can't safely do is round-trip from arbitrary XML to JSON and back to XML. That's because, as the parent said, there are features in XML that don't exist in JSON. That means you are forced to find a way to encode it using the features you do have, but then you can't tell your encoding apart from valid values.

[0] https://www.w3.org/TR/xslt-30/#func-xml-to-json

You could conceivably serialize the DOM as a JSON object, but the representation would be very difficult to work with:

    {
      "type": "element",
      "name": "blink",
      "attributes": {
        "foo": "bar"
      },
      "children": [
        {
          "type": "text",
          "content": "example text"
        }
      ]
    }