Hacker News new | ask | show | jobs
by russellbeattie 3327 days ago
For anyone who's tried to write a real-world RSS feed reader, this format does little to solve the big problems the newsfeeds have:

* Badly formed XML? Check. There might be badly formed JSON, but I tend to think it'll be a lot less likely.

* Need to continually poll servers for updates? Miss. Without additions to enable pubsub, or dynamic queries, clients are forced to use HTTP headers to check last updates, then do a delta on the entire feed if there is new or updated content. Also, if you missed 10 updates, and the feed only contains the last 5 items, then you lose information. This is the nature of a document-centric feed meant to be served as a static file. But it's 2017 now, and it's incredibly rare that a feed isn't created dynamically. A new feed spec should incorporate that reality.

* Complete understanding of modern content types besides blog posts? Miss. The last time I went through a huge list of feeds for testing, I found there were over 50 commonly used namespaces and over 300 unique fields used. RSS is used for everything from search results to Twitter posts to Podcasts... It's hard to describe all the different forms of data it can be contain. The reason for this is because the original RSS spec was so minimal (there's like 5 required fields) so everything else has just been bolted on. JSONFeed makes this same mistake.

* An understanding that separate but equal isn't equal. Miss. The thing that http://activitystrea.ms got right was the realization that copying content into a feed just ends up diluting the original content formatting, so instead it just contains metadata and points to the original source URL rather than trying to contain it. If JSONFeed wanted to really create a successor to RSS, it would spec out how to send along formatting information along with the data. It's not impossible - look at what Google did with AMP: They specified a subset of formatting options so that each article can still contain a unique design, but limited the options to increase efficiency and limit bugs/chaos.

This stuff is just off the top of my head. If you're going to make a new feed format in 2017, I'm sorry but copying what came before it and throwing it into JSON just isn't enough.

4 comments

FWIW, This is by Manton Reece and Brent Simmons. And Simmons is known (among other things) as the creator of NetNewsWire which has been around for more than 15 years. He does know a bit about Atom and RSS feeds.

https://en.wikipedia.org/wiki/NetNewsWire

Ok, I have no idea who these guys are so forgive me being rude: if they're so good then why did they not address those points? to my eyes, op makes a solid argument. I'd like to know their side of the story.
But they did...

> Badly formed XML? Check. There might be badly formed JSON, but I tend to think it'll be a lot less likely.

The problem with XML is mostly that it is a very complex format so the bugs are more probable and there are more pitfalls.

> Need to continually poll servers for updates? Miss. Without additions to enable pubsub, or dynamic queries ...

They actually did add tags to enable WebSub (previously called pubsub) so there goes that. For the other concerns, I think it is not the formats job to care for partial or incomplete data. Nothing prevents you to have a dynamic link with a "updatesSince" on your webpage and serve all of the articles that were added or updated after that. Nowhere, the format specifies the limit on number of items. It also incorporates paging out of the box so you could bubble up any old articles.

> Complete understanding of modern content types besides blog posts? Miss.

The point of this is for the open web, by definition nobody can anticipate all formats. Rather than fill the spec with tweets, facebook and other types, they have opted for the least common denominator and added a specific way to add extensions. This makes way more sense.

* An understanding that separate but equal isn't equal. Miss.

Nothing actually prevents you to leave the content fields blank and rely on the reader to pull the format. But for this kind of usage there are other methods. Personally I prefer content delivered in the RSS precisely to avoid to have to deal with customization of content formatting. JSON feed HAS a way to specify formatting though, it's called HTML tags. No need to reinvent the wheel here.

I don't agree with most of what you wrote, but the "it's called HTML tags" is the most wrong. You must not have tried this any time in the past 5 years or so. The embedded tags come out of CMSs and - when they're not stripped completely - look like <div class="title-main-sub-1"> and <span class="sub-article-v5-bld">. HTML isn't used alone, it's always used with CSS nowadays, and no matter if semantic tags are best practice, the fact is it's optional and regularly not used. If they're going to create a new standard format, they need to address this.
What is the difference between re-publishing the content in some other format which will do formatting well and re-publishing the content using sensible html tags with maybe some embedded minimal stylesheet?

There might be mis-use and abuse, but if you want to avoid that you can always push markdown into the "text" representation.

One has to wonder whether Simmons is just trying to revive the old RSS ecosystem. "What do developers like these days, JSON? Let's do RSS in JSON!" ... This does not help.

The real challenge these days is to replicate the solutions Facebook and Twitter brought to feeds (bidirectionality and data-retention in particular) in a decentralised manner that could actually become popular. Simply replicating RSS in the data-format du jour is not going to achieve that.

> Need to continually poll servers for updates? Miss. Without additions to enable pub sub, or dynamic queries, clients are forced to use HTTP headers to check last updates, then do a delta on the entire feed if there is new or updated content.

This is backwards, imo. The advantage of polling over pub sub is that all complexity is offloaded to the client. This comes with its own set of problems (inefficiency of reinventing the wheel across all clients, plus every client will implement that complexity differently resulting in countless bugs), but this is what drives adoption, which as someone else here has pointed out is all that matters. If you want adoption, you seemingly need to sacrifice a lot of efficiency in favour of making it stupidly easy to publish.

The "it's 2017 now" argument doesn't really address that even with dynamically generated content, you still need every dynamic serverside platform to adopt and implement your spec independently. Static is always easier. (plus with the recent trend towards static sites, "it's 2017 now" actually has the opposite implication).

Plus, you can always reuse PubSubHubBub (now WebSub[1]), which is already used in RSS/Atom feeds to provide optional subscribing to updates if both the server and client support it.

[1] https://www.w3.org/TR/websub/

The thing that http://activitystrea.ms got right was the realization that copying content into a feed just ends up diluting the original content formatting, so instead it just contains metadata and points to the original source URL rather than trying to contain it.

It's a shame that ActivityStrea.ms hasn't had more uptake. We've added support in our enterprise social network product and think it enables some cool scenarios. But unfortunately too few other products support it these days.

> Need to continually poll servers for updates? Miss.

The point of these syndication formats (RSS, Atom, now this) was always to act as the "I'm a static site and webhooks don't exist, so poll me" equivalent of webhooks. These "pretending to be webooks" were supposed to hook into a whole ecosystem of syndication middleware that turned the feeds into things like emails.

And that—the output-products of the middleware—was what people were supposed to consume, and what sites were meant to offer people to consume. The feed, as originally designed, was not intended for client consumption. That's why the whole model we have today, where millions of "feed-reader" clients poll these little websites that could never stand up to that load, seems so silly: it wasn't supposed to be the model. RSS feeds were supposed to be a way for static-ish content to "talk to" servers that would do the syndicating for them; not a format for clients to receive notifications in.

(And we already had a format for clients to receive notifications in: MIME email. There's no reason you can't add another MIME format beyond text/plain and text/html; and there's no reason you can't create an IMAP "feed-reader" that just filters your inbox to display only the messages containing application/rss+xml representations, and set up your regular inbox to filter out those same messages. And some messages would contain both representations, so you'd see e.g. newsletters as both text in your email client and as links in your feed client, and archiving them in one would do the same in the other, since they're the same message.)

---

The big problem I have with feeds (besides that people are using them wrong, as above) is that they have no "control-channel events" to notify a feed-consumer of something like e.g. the feed relocating to a new URL.

Right now, many feeds I follow just die, never adding a new feed item, and the reason for that is that, unbeknownst to me, the final item in the feed (that I never saw because it rotted away after 30 days, or because I "declared inbox zero" on my feeds, or whatever else) was a text post by the feed's author telling everyone to follow some new feed instead.

And other authors don't even bother with that; they use a blogging framework that generates RSS, but they're maybe not even aware that it does that for them, so instead they tell e.g. their Twitter followers, or Twitch subscribers, that they're moving to a new website, but their old website just sits there untouched forever-after, never receiving an update to point to the new site which would end up in the RSS feed my reader is subscribed to. It's nonsensical.

(And don't get me started on the fact that if you follow a Tumblr blog's RSS feed, and the blog author decides to rename their account, that not only kills the feed, but also causes all the permalinks to become invalid, rather than making them redirect... Tumblr isn't alone in this behavior, but Tumblr authors really like renaming their accounts, so you notice it a lot.)

HTTP 301 Moved Permanently is the out of band control channel. Sometimes it even seems to work, depending on software of course.

There was also a typical Dave-Wineresque invention of replacing the old feed with some special, non-namespaced XML with the redirect: http://www.rssboard.org/redirect-rss-feed

But of course the real problem is social. As in people simply stop blogging or stop caring. And of course tool developers don't care if someone doesn't want to use their software anymore and don't think of developing the right buttons for this edgecase.

> HTTP 301 Moved Permanently is the out of band control channel.

True, but requires you to be able to set response codes on the server. I can't make my Github Pages site, or my Tumblr blog, or my S3 bucket, emit a 301. And those are the sorts of things that RSS was designed for: static sites that can't just, say, tell their backend to email people on update. You'd think that, knowing that, RSS et al would have been designed with in-band, rather than out-of-band, control.