| HN Mirror

> I analyzed the feeds of some of the top contributors on HN, and almost all had something wrong with them.

I’m sceptical about your analysis, because your tool makes spurious complaints about my feed <https://chrismorgan.info/feed.xml> which show that it’s not parsing XML correctly. For stupid reasons¹ that I decided not to fix or work around, many of the slashes are encoded as /, which is perfectly valid, but your tool fails to decode the character references inside attribute values. I don’t know what dodgy parser you’re using, it’s possible this is the only thing it gets wrong about parsing XML², but it doesn’t instil confidence. I would expect a strict XML parser to be more reliable. I’ve literally only once encountered a feed that was invalid XML³. Liberal parsing is not a virtue, it’s fragile in a different way. Postel was wrong.

—⁂—

¹ I wish OWASP’s XSS protection cheat sheet had never been written. I will say no more.

² Honestly, parsing XML isn’t very hard; once you’re past the prologue, there are literally only about seven simple concepts to deal with (element, attribute, text, processing instructions, comments, cdata, character/entity references), with straightforward interactions. Not decoding references in attribute values is a mind-boggling oversight to me.

³ WordPress thinks it’s okay to encode U+0003 as  in an XML 1.0 document.