Hacker News new | ask | show | jobs
by halostatue 1175 days ago
Every JSON schema is also a potential DSL that reinvents everything. Yes, there seems to be some convergence on things, but object arrays in XML aren’t really any more complex than object arrays in JSON — there just might be multiple ways to represent them.

For this JSON:

    {
      "part_numbers": [1, 2, 3, 4, 5]
    }
You have two main ways to represent these in XML:

    <!-- repetition = array -->
    <order>
      <part_number>1</part_number>
      <part_number>2</part_number>
      <part_number>3</part_number>
      <part_number>4</part_number>
      <part_number>5</part_number>
    </order>

    <!-- wrapped repetition -->
    <order>
      <part_numbers>
        <part_number>1</part_number>
        <part_number>2</part_number>
        <part_number>3</part_number>
        <part_number>4</part_number>
        <part_number>5</part_number>
      </part_numbers>
    </order>
Is this better than JSON? No, not particularly. But it’s no less clear than the JSON, and it compresses pretty well (it compresses better for larger documents, obviously).

The larger problem with XML is that the tooling is often lacking outside of Java and C#/.NET and none of the tooling is well-built for the sort of streaming manipulation that `jq` does (it exists, but IMO one of the least usable ideas from the XML camp is XSLT), and JSON support is pretty universal everywhere, even if the advanced things like JSONpath and JSON Schema aren’t.

I also think that there’s a problem when you have to choose between SAX and DOM parsing early in your process. Most JSON usage is the equivalent of using a DOM parser because the objects are expected to be relatively small, but many XML systems are built for much larger documents, and therefore need to parse the stream because the memory use otherwise would be unacceptable. The use of a JSON streaming parser is much rarer, IME.

1 comments

Where XML shines is when you pass more complex data types than numbers and strings. If you repeated your example for an array of dates, as an example, strictly speaking you can't even generate the JSON. We'd first have to agree on what string representation of a date we want to use. For XML it's built into the spec.
In JSON the de facto standard for datetime is (because of JavaScript) very much the Unix msec timestamp (which is always in UTC) so while it's not hardcoded in spec you basically need to be an idiot not to do it like that, and removes one huge headache of XML dates which is timezones.
I don’t think that I’ve ever seen msec timestamps passed around because JSON numbers are floats, which means that there’s a limit to the precision available (which is to imply as well that currency amounts should be passed as decimal strings in JSON for safety as well).

Suggesting that msec timestamps resolves timezone issues is naïve at best, because anytime you are passing something that refers to a real time (that is, it is significant to humans) rather than an instant time (that is, it is something like an event log timestamp), you are dealing with time in a particular place, which has human impact — cultural, legal, linguistic.

Passing around timestamps as RFC3339 UTC strings with timezone names and offsets (much like one should be doing in databases) is what would be recommended for real (human) times.

Okay, so the point at which you need to adopt a schema language in toy examples is earlier with JSON, but in most practical cases you’ll want to do that in either JSON or XML (because, even if you are only using built-in types, you’ll still want to communicate the shape), so this objection is kind of meaningless.
Well, no. Because JSON & by extension OpenAPI lack a Date type you can't easily add validations about dates to those schemas. Like you can't say this particular date must be in the past in an OpenAPI spec because it has no concept of a date. The best you can do is a regex on the strings you call dates but that falls apart pretty quick.
> Like you can't say this particular date must be in the past in an OpenAPI spec because it has no concept of a date.

I don't get these types of arguments.

There's zero reason you can't write code that parses a date in an expected format (and throws an error if the date is formatted incorrectly) and then checks that the date is in the past.

Yes, it does mean you'll spend time writing more code (You know, the job you're being paid to do?), and it would be nice if your data format supported such automatic checking functionality out of the box, but to say "It can't be done!" is just plain silly.

> "It can't be done!" is just plain silly

It's a good thing I didn't say that.

>Yes, it does mean you'll spend time writing more code

The whole point of WSDLs and OpenAPI is to minimize the amount of time it takes to consume your API. Saying you have to write more code is highlighting the shortcomings of OpenAPI at doing the only thing it's built to do. Which is why companies have largely punted on providing OpenAPI specs in favor of maintaining libraries in a handful of popular languages.