|
|
|
|
|
by muchcomment
4012 days ago
|
|
Had my first encounter with \v doing an import of a legacy database just a few weeks ago. The data was passed on to us in a batch of XML-files. For some reason our XML parsing library would just ignore the rest of the file when it came to the \v character. Took me some time to find the culprit. Edit: The \v character had somehow made it into one of the descriptions for one of the user profiles. |
|
But you should have gotten an error, of course, not the silent truncation you imply.
If you need to salvage the character, your XML library may let you specify it as �b;. That is still a violation, but a lot of libraries seem to let it through: http://www.w3.org/TR/REC-xml/#sec-references (see "Well-formedness constraint"... you are specifically not allowed to use this to do what I'm suggesting here).
Anyways, the moral here is that XML CAN NOT carry arbitrary binary, and EVERY TIME you output something in XML, something in the system needs to run some sort of encoding & illegal-character cleaning pass on the output text. The moral equivalent of "<tag>$content</tag>" in your language is ALWAYS wrong, unless you specifically processed $content into XML character content earlier. This is true even when your really sure $content is "safe". Even if you're right... and statistically speaking, you're not... do it correctly anyhow and call the right encoding function.