Hacker News new | ask | show | jobs
by jfim 16 days ago
Prototypes aren't only for UX though, sometimes they're for exploring whether something is technically possible, or what are the unknown unknowns in a particular area.

For example, for personal projects, I've been wondering if it's possible to automatically create RSS feeds for pages that don't have them (yes), what are the challenges when building an archive-style page dumping system (need to dump CSSOM alongside getOuterHTML, remove/rewrite remote content, walk iframes, automate Chrome, scroll to load lazily loaded content, etc.), and if training a model to remove native ads from markdown coming from readability is possible (no, at least not with my current approach, but using the dom might work).

1 comments

Why wouldn't you use Archive Box?

https://github.com/archivebox/archivebox

A few reasons. Learning is one of them, since I don't normally deal much with browser and web related technologies, so it's a good way to learn more about them.

I also think there are a few interesting things you can explore that go beyond a simple carbon copy of what's on the Internet. Ideas that I've implemented are things like automatic extraction of audio tracks, transcription, and summarization, loading a page or podcast transcript into the context window of a LLM to discuss the arguments or factuality of the claims being made, automatically turning articles to reader view using readability/trafilatura, etc.

Directions I'd like to explore would be things like multimodal search ("that page I read six months ago about computer security with neon green text on a black background", or give me a list of fitness related pages I've read in the last twelve months), personal statistics (how is the mix of topics I've been reading about changing over time), annotating pages instead of just passively reading them, maybe even P2P archiving or discussions about pages, and all kinds of other things.

But installing archivebox would be easier indeed.

Mostly because I only need to get a site into an RSS feed, I don't need a massive archival solution to do that.
I was today years old when I learned about this. Thank you!