Hacker News new | ask | show | jobs
by aji 21 days ago
i never really liked CDATA but i'm not buying the argument here since you can do the escaping with replaceAll("]]>", "]]]]><![CDATA[>") instead of four replaceAlls. (assuming you are writing your own xml serializer in javascript in 2026 for some reason)
2 comments

Just out of curiosity, I looked at the HN RSS feed and they still use regular escape for titles (and some other things, except description). It means they use 2 versions of escape instead of 1. So why not just use 1?
Different requirements.

The description contains HTML markup, such as <p></p> for paragraph breaks. CDATA is a nice and clean way to encode them without breaking anything.

The title doesn't contain any markup, and shouldn't. A good old escape function covers both the "doesn't" part and the "shouldn't" part.

What requirments are you talking about? Human readability? IMHO, RSS is for feed readers, not humans. When looking at https://news.ycombinator.com/rss , the RSS isn't that human friendly at all, all line breaks are removed. The point is the simplicity and uniformity, regular escape works well for many cases, not just description.
That assumes that you don't have anything else to escape or sanitize.

I see people stuffing all sorts of HTML tags and nonstandard attributes in an RSS <description>, just because CDATA allows them to do so without breaking the parser. Images, videos, inline SVGs with maybe some scripts inside...

The RSS spec should never have allowed this. Reading a feed would have been much more pleasant (not to mention safer for everyone!) if the contents were required to be in plain text.

I’m not sure I understand why this is a problem. RSS is a spec for publishing a list of available content, or publishing the content directly. Formatting that content was always going to be something people wanted to do, so whether it was rich text, html or what became markdown, it was inevitable that aggregators were always going to have to deal with both publishes wanting their publication to have styles and users wanting their aggregator software to either handle that style or hide it.

At least with a cdata tag your being explicitly told “here be dragons”

I guess the difference is if you want the descriptions to be readable by simple tools, or if you assume that every reader has a full-fledged Chrome available.