|
|
|
|
|
by Fileformat
221 days ago
|
|
Another point: it is shocking how many feeds have errors in them. I analyzed the feeds of some of the top contributors on HN, and almost all had something wrong with them. Even RSS wizards would benefit from looking at a human-readable version instead of raw XML. I ended up writing a feed analyzer that you can try on your feed: https://www.rss.style/feed-analyzer.html |
|
I’m sceptical about your analysis, because your tool makes spurious complaints about my feed <https://chrismorgan.info/feed.xml> which show that it’s not parsing XML correctly. For stupid reasons¹ that I decided not to fix or work around, many of the slashes are encoded as /, which is perfectly valid, but your tool fails to decode the character references inside attribute values. I don’t know what dodgy parser you’re using, it’s possible this is the only thing it gets wrong about parsing XML², but it doesn’t instil confidence. I would expect a strict XML parser to be more reliable. I’ve literally only once encountered a feed that was invalid XML³. Liberal parsing is not a virtue, it’s fragile in a different way. Postel was wrong.
—⁂—
¹ I wish OWASP’s XSS protection cheat sheet had never been written. I will say no more.
² Honestly, parsing XML isn’t very hard; once you’re past the prologue, there are literally only about seven simple concepts to deal with (element, attribute, text, processing instructions, comments, cdata, character/entity references), with straightforward interactions. Not decoding references in attribute values is a mind-boggling oversight to me.
³ WordPress thinks it’s okay to encode U+0003 as  in an XML 1.0 document.