Hacker News new | ask | show | jobs
by privatenumber 798 days ago
Mozilla has released Readability as a standalone package so you can avoid spinning up a browser entirely: https://github.com/mozilla/readability
2 comments

I still wanted the browser for UBlock Origin and handling sites with heavy JS. I was using the standalone Readability script already but today I ended up dropping it for Trafilatura. It works a lot better.

The inefficiency of using a browser rather than just taking the html doesn't really matter because the limiting factor is the LLM here.

And yes the LLM is essential for getting clean data. None of the existing methods are flexible enough for all cases even if people say "you don't need AI to do this".

you would still need to run. For js based websites.