Hacker News new | ask | show | jobs
by slaymaker1907 1168 days ago
My biggest gripe with XML is that it can't represent arbitrary strings easily. Even in the latest versions of XML, you can't easily serialize strings with embedded nulls since it is forbidden by the spec to even use something like "�". XML 1.0 was even worse since it doesn't allow any characters which require surrogate pairs under UTF-16. Instead, the spec writers apparently expect devs to come up with their own escaping scheme in which case why bother having a standard at all?

Even C# just punts on this issue and won't emit valid XML if a string you serialize happens to have a null character in it.

1 comments

If I had to deal with strings that XML won't allow, I'd probably just rely on encoding the data in Base64 before throwing it into the XML.

A human won't be able to read it (Unless you're crazy and have learned to read Base64), but the application still can easily. You'll just have to add a Base64 translation step before/after serialization/deserialization.

It's very annoying to do that though since that introduces a bunch of logic in the application and also removes the benefit of being able to read the strings in the XML as a human.