Hacker News new | ask | show | jobs
by slaymaker1907 1167 days ago
Most languages (C#, Java, Rust, JavaScript, etc.) support nulls in the middle of strings so it can be a security vulnerability if you try to serialize untrusted input to XML. I'd much rather be able to encode anything my input language considers a string and deal with excessive escaping than need to worry about what I'm going to do with inputs that my serialization language cannot support.
1 comments

I'm curious what the vulnerability is? Also not clear what the null character is. Any links I can follow?

And again, if this is your line in the sand, how do you serialize NaN and Infinity in JSON?

Edit: Playing with this a bit, I'd actually assume that allowing \0 would be a vulnerability. I was curious how browsers treat it, so I see that parsing to an html document seems to just drop the characters? Fun little rabbit hole to jump in!

Yeah, that's why I consider it to be a breeding ground for vulnerabilities. People will probably just assume the XML serializer can handle any strings in their language of choice and not handle those edge cases. What I ended up doing for my use case was to encode nulls as "&#0;" but within a CDATA section so it was interpreted literally (choosing ambiguity over omission). The best way would probably be to have some sort of spell <null /> element, but there isn't such a thing within the standard. There asi:nil, but that is really indicating something else.
But what is the vulnerability? And what is a null character doing in a text document?

If you are just worried about data loss, having null allowed in text segments is already begging for failure, as C programs will almost certainly get them wrong.

If you are transferring binary, base64 or similar will already cover you.

And again, if this is a strike on xml, how do you represent NaN in a JSON document? Do what DynamoDB does and wrap all numbers in quotes?