Hacker News new | ask | show | jobs
by taeric 1168 days ago
JSON is garbage to read largely due to how much needs escaping. This is largely fine for smaller documents, but there is a reason yaml and toml both gained traction over raw json for config files.

And I don't make any real defense of some of the darker corners of XML. In particular, I already criticized entities being a bit too much. Namespaces are also something that, while I can see the desire, the implementation is way too much for most of us.

JSON schema is going to be cursed for a long time. Just the odd treatment of it will be a problem. (In particular, that it is a subset of the numbers that javascript itself supports is... awkward.)

I also confess, though; that I'm not clear why I would want a null in the middle of a string? That feels like a gun loaded and aimed squarely at a foot.

1 comments

Most languages (C#, Java, Rust, JavaScript, etc.) support nulls in the middle of strings so it can be a security vulnerability if you try to serialize untrusted input to XML. I'd much rather be able to encode anything my input language considers a string and deal with excessive escaping than need to worry about what I'm going to do with inputs that my serialization language cannot support.
I'm curious what the vulnerability is? Also not clear what the null character is. Any links I can follow?

And again, if this is your line in the sand, how do you serialize NaN and Infinity in JSON?

Edit: Playing with this a bit, I'd actually assume that allowing \0 would be a vulnerability. I was curious how browsers treat it, so I see that parsing to an html document seems to just drop the characters? Fun little rabbit hole to jump in!

Yeah, that's why I consider it to be a breeding ground for vulnerabilities. People will probably just assume the XML serializer can handle any strings in their language of choice and not handle those edge cases. What I ended up doing for my use case was to encode nulls as "&#0;" but within a CDATA section so it was interpreted literally (choosing ambiguity over omission). The best way would probably be to have some sort of spell <null /> element, but there isn't such a thing within the standard. There asi:nil, but that is really indicating something else.
But what is the vulnerability? And what is a null character doing in a text document?

If you are just worried about data loss, having null allowed in text segments is already begging for failure, as C programs will almost certainly get them wrong.

If you are transferring binary, base64 or similar will already cover you.

And again, if this is a strike on xml, how do you represent NaN in a JSON document? Do what DynamoDB does and wrap all numbers in quotes?