Hacker News new | ask | show | jobs
by londons_explore 613 days ago
I wonder if popular software for generating RSS feeds might not be setting the correct content-type header? Maybe this whole issue could be mostly-fixed by a few github PR's...
4 comments

Correct might be debatable here as well. My blog for example sets Content-Type to text/xml, which is not exactly wrong for an RSS feed (after all, it is text and XML) and IIRC was the default back then.

There were compatibility issues with other type headers, at least in the past.

I think the current correct content types are:

'application/rss+xml' (for RSS)

'application/atom+xml' (for Atom)

Sounds like a kind samaritan could write a scanner to find as many RSS feeds as possible which look like RSS/Atom and don't have these content types, then go and patch the hosting software those feeds use to have the correct content types, or ask the webmasters to fix it if they're home-made sites.

As soon as a majority of sites use the correct types, clients can start requiring it for newly added feeds, which in turn will make webmasters make it right if they want their feed to work.

Not even Cloudflares own blog uses those, https://blog.cloudflare.com/rss/, or am I getting a wrong content-type shown in my dev tools? For me it is `application/xml`. So even if `application/rss+xml` were the correct type by an official spec, it's not something to rely on if it's not used commonly.
I just checked Wikipedia and it says Atom's is 'application/atom+xml' (also confirmed in the IANA registry), and RSS's is 'application/rss+xml' (but it's not registered yet, and 'text/xml' is also used widely).

'application/rss+xml' seems to be the best option though in my opinion. The '+xml' in the media type tells (good) parsers to fall back to using an XML parser if they don't understand the 'rss' part, but the 'rss' part provides more accurate information on the content's type for parsers that do understand RSS.

All that said, it's a mess.

It wouldn't. It's the role of the HTTP server to set the correct content type header.
The number of feeds with crap headers and other non-spec stuff going on; and loads of clients missing useful headers. Ugh. It seems like it should be simple; maybe that's why there are loads of naive implementations.
Quite a few feeds out there use the incorrect type of text/xml, since it works slightly better in browsers by not prompting a download.

Would not surprise me if Cloudflare lumps this in with text/html protections.