Hacker News new | ask | show | jobs
by PaulHoule 946 days ago
Failing RSS readers are still failing with the same failing interfaces that have been failing since 1999 and the strange thing is that there is very little insight about the phenomenon from people who write RSS readers and potential users.

Two perennially unpopular but strangely persistent interfaces are (1) the RSS reader that looks like an email or Usenet (dead) reader. The cause of death here is the "mark as read" button that makes every item the system ingests a burden to the reader -- for all this click, click, clicking the system is not gathering any information about the feed items or the user's preferences and (2) the RSS reader that renders 200 separate lists for 200 separate RSS feeds. If your plan is to scan, scan, and scan some more, why not just visit the actual web sites?

Contrast that to the successful interface of Twitter where new content displaces old content, where if you walk away for a week you see recent content and don't need to click, click and click to mark a week's worth of content as read (though there hopefully is a button to nuke everything.) And now there is Tik Tok and RSS readers still barrel on as if the last 15 years didn't happen.

RSS needs algorithmic feeds to really be superior to "I have a list of 50 blogs I check every morning." That is, you have to be able to ingest more feed items than you can handle and have the important and interesting stuff float to the top.

I was involved a bit in text classification research 20 years ago and it was clear to me that an algorithmic feed for RSS was very possible with the caveat that it would take a few 1000 judgements yet even at that time it was clear that you could never underestimate people's laziness when it comes to making training sets. Most people would expect to give five or so judgements.

I had thought about the problem for years, a bit about ideas that would improve low-judgement performance, but I did very little other than this project

https://ontology2.com/essays/HackerNewsForHackers/

and

https://ontology2.com/essays/ClassifyingHackerNewsArticles/

Last December I started working on YOShInOn, my smart RSS reader and intelligent agent with a primary user interface that looks like "TikTok for text" (no wasted clicks to 'mark as read' because I am always collecting preference information.) I started out with the same model from the article above (applied to the whole RSS snippet as opposed to just the title) and upgraded to a BERT-based model.

It shows me the top 5% of about 3000 ingested articles a day.

I am still thinking about how to productize it. On an intellectual level I'm interested in the problem with training on fewer examples. Ironically it wouldn't be hard to do experiments because I have a year of data I can sample from, but I couldn't care less on a practical level because I have 50,000 judgements already... And it is all about "pain driven development" where I work on features that I want right now for me.

If I were going to make a SaaS version of it would almost certainly fall back on collaborative filtering (pooling preferences from multiple users) because users would perceive it to learn more quickly.

See the entirely forgotten https://en.wikipedia.org/wiki/StumbleUpon

1 comments

Have you tried Feedly? One of the sort options is "most popular", although I'm not sure how that works. Older unread stuff ages out automatically after about a month.
Popularity is a useful metric for ranking (if A is more popular than B it is more likely I’ll like A better than B than if it is the other way around.) but combining it with a relevance score can be tricky. (e.g. it is not so straightforward to incorporate PageRank into a web search engine and really get better results.)

The interesting thing I see in Feedly is it seems to have a broad categorization: you might get some topic like “American Football”. I think users will certainly feel more in control if they can pick topics like that.

YOShInOn does ingest categories that are supplied by the feeds. I’ve also thought about adding a query language inspired by OWL (contains word X or word Y and is not a member of category Z) but now when I want to do a query I hand code it. If there is ever a “YOshInOn Enterprise Edition” it will have some system for maintaining multiple categorizations so it will be able to put labels like “American Football” on.