1. The Jina reader API - https://jina.ai/reader/ - add r.jina.ai to any URL to run it through their hosted conversion proxy, eg https://r.jina.ai/www.skeptrune.com/posts/use-the-accept-hea...
2. Applying Readability.js and Turndown via Playwright. Here's a shell script that does that using my https://shot-scraper.datasette.io tool: https://gist.github.com/simonw/82e9c5da3f288a8cf83fb53b39bb4...
[1]: https://github.com/JohannesKaufmann/html-to-markdown
[2]: https://github.com/devflowinc/firecrawl-simple
This is much cheaper to run on a server. For example: https://github.com/ozanmakes/scrapedown
[1]: https://github.com/JohannesKaufmann/html-to-markdown
[2]: https://github.com/devflowinc/firecrawl-simple