Hacker News new | ask | show | jobs
by stormfather 508 days ago
> As well as an algorithm for efficiently converting/compressing large html pages into a semantic format.

For the love of humanity please open source this. This seems tremendously useful by itself.

2 comments

There is an open source alternative that might be even better: https://playwright.dev/docs/api/class-locator#locator-aria-s....
Oh damn I will definitely look into open sourcing it and making it a sdk
Awesome! I write LLM powered scrapers and stuff all the time and one of the biggest pain points is HTML is full of so much crap that isn't meaningful and overwhelms the context. And being a data science guy idk how to solve this.
awesome that's the same reason why I use it. It's basically a balance between the full html and having the markdown type scrapers that are better for just text. Do you mind if I reach out to you once I set up the Github?
You're very welcome to! Please do. You can reach out to notpricedinyet@gmail.com