Hacker News new | ask | show | jobs
by PaulHoule 681 days ago
If you look up YOShInOn in the search box for Hacker News you will hear a lot about the last version of the product which is an RSS reader that is written in Python, uses HTMX for the front end and SBERT embeddings and scikit-learn for classification and clustering and does it all with asyncio. (I was about to call it "version 1" but it has predecessors that go back about 20 years)

I've also developed Fraxinus, which I was earlier calling an "image sorter" which adds a bookmark manager and webcrawler. It started out asyncio but I was having trouble w/ the web server blocking so I rewrote it as sync and it now runs under gunicorn and has a celery server running. The first version would let me import image galleries with a bookmarklet and it has a tagging and query system which is designed for machine learning systems. (e.g. you can put on a negative tag which says "this is not a member of class C" and also an indeterminate tag which means "we're looking for your opinion if this is a member of class") Believe it or not I am thinking about going to a React or similar GUI, at least for some things, because this query system is capable of answering complex queries that take complements and intersections of a large number of tags and attributes which is more than the "Union of N tags and attributes" which is easy to do in HTMX.

I am thinking about merging the code bases which would be basically implementing YOShInOn inside of Fraxinus. I'm not in such a hurry because the system works well, but the one problem I have with it now is that it runs in a batch mode where it picks about 300 articles to show me, I look at them, then it classifies another batch. Earlier I was reading more articles and it turned around in about 1.5 days but I haven't been using it as much lately and sometimes a cycle goes 7+ days and when you add up all the queues involved an article can go a long time between when it is published, my system picks it up, I favorite it, then schedule it to get posted on HN or Mastodon. For most of the topics I cover it is not so bad because many "news" items are still interesting after two weeks but it's a disaster for sports news stories which often are only of interest until the next game.

So I am thinking of how to make it run in an incremental mode that can tag certain topics for fast processing and also make it more compatible with ActivityPub so it can be a "Mastodon reader that doesn't suck".