Hacker News new | ask | show | jobs
by lapcat 17 days ago
I come from a very different, old-school perspective, because I hand-write my blog posts in HTML and also hand-write my RSS feed in XML.

I've found CDATA invaluable, because I can just copy and paste the content from the HTML file to the XML file. I've never used the CDATA terminator characters in a blog post, so that's a non-problem.

1 comments

This is mostly about when you write your automated feed generator.
> This is mostly about when you write your automated feed generator.

Yes, that's why I said, "I come from a very different, old-school perspective."

However, I don't find the points persuasive:

1. A special case for the CDATA terminator doesn't seem any worse than special cases for every HTML character that needs to be escaped in XML.

2. I'm not sure who exactly the hypothetical misled people are (straw men?) who would think "the content is raw HTML or somehow safer."

3. I'm not sure how split CDATA blocks is "less uniform" than escaped characters or why less uniform output is a downside, especially as you state in another comment, "IMHO, RSS is for feed readers, not humans."

4. I'm not sure how CDATA makes "debugging confusing," and in any case using CDATA blocks inside an article seems like a pretty rare case; like I said, I haven't done that myself.

The argument is from the generator implementation point of view. Using regular escape is much simpler. Unlike regular escape, you still cannot use CDATA for attribute values. CDATA might quickly become a footgun because it can give you a false sense of security. Regular escaping is much more universal and also works with HTML content / attribute values.

> I'm not sure how CDATA makes "debugging confusing," and in any case using CDATA blocks inside an article seems like a pretty rare case; like I said, I haven't done that myself.

Debugging can be confusing when you actually encounter that closing sequence, the text becomes much less readable and seems kind of broken. With regular escape the content is more distinguishable (even though harder to read) from the structural XML. Actually, its rarity can be more of a problem because you might never know about it and not even handle it in your own serializer at all. The "magic" of CDATA is dangerous. You might not believe it, but many developers still don't do any proper escaping when injecting text in DOM elements. They often do element.innerHTML = "Some untrusted text". I have seen such things countless of times.