Hacker News new | ask | show | jobs
by davepeck 1014 days ago
This is neat, thanks. I was looking at your HN OPML file on GitHub [0] and noticed that the `xmlURL` and `htmlURL` attributes are the same for each entry; the `htmlURL` currently points to the feed rather than the site. Do you happen to have the original HTML URLs available? Would be nice to have both.

(Secondarily, I'm guessing some of the `type` attributes should probably be "atom" rather than "rss"?)

[0] https://github.com/outcoldman/hackernews-personal-blogs/blob...

1 comments

> I was looking at your HN OPML file on GitHub [0] and noticed that the `xmlURL` and `htmlURL` attributes are the same for each entry; the `htmlURL` currently points to the feed rather than the site. Do you happen to have the original HTML URLs available? Would be nice to have both. (Secondarily, I'm guessing some of the `type` attributes should probably be "atom" rather than "rss"?)

I'm just using the file that someone else made, but I guess they didn't really make the distinction between those URLs in the code, though it shouldn't be too hard to modify: https://github.com/outcoldman/hackernews-personal-blogs/blob...

It is also true that there are both RSS, Atom and possibly other feed types mixed in there. What I did for my site was to crawl through all of those feeds and process them one by one: get all of the posts, do some ordering and grouping and output everything as RSS feeds in a consistent format.

For example, here's the top 100 user feeds for 2023: https://hn-blogs.kronis.dev/feed-top100.xml

Those have HTML links for each of the posts, though I'm afraid it's not exactly what you're asking for (the HTML URLs for the sites/feeds themselves), because I don't actually store that anywhere in my case, though the original feeds of the sites should have that information too.