Hacker News new | ask | show | jobs
by spdustin 4011 days ago
I've dealt with vertical tabs and linefeeds by just Base64-encoding character data that might include them before stuffing it into a CDATA node in the XML doc.

It's a hack, sure, having to encode/decode all the time, but if you need to store those characters, it's the only bulletproof way I've found.

2 comments

I have to admit I'm still kind of split on whether XML made the right call here. It's tricky with character encodings to allow arbitrary binary in the characters, but something like CDATA could have permitted it, perhaps with a shell-like specification of a terminating byte sequence, or even with a UTF-8-style prefix number that indicates the length. This sounds great to me at first. But then I put on my security hat and consider what horrors would transpire in the bowels of programs unprepared to handle binary or somehow can be tricked during validation vs. parsing or any number of other nightmares one could do with this, and I go back to neutral-at-best. (I'd go negative, but on the other, other hand [1], a lot of these things are already happening as people blithely stuff these things in to XML documents anyhow, standard or no.)

[1]: No, not gripping hand... that's only for when the third choice is the dominant/default/obviously-correct-once-I-say-it choice.

Yep, that's a correct and fairly standard way of embedding binary data in XML. Base64 Encode.

Always makes me nostalgic for usenet. Which yes, technically was UUEncode back in usenet days, some slight technical differences from Base64 Encode.