Hacker News new | ask | show | jobs
by nikcub 774 days ago
Most of the top LLM already do this very well. It's because they've been trained on web data, and also because they're being used for precisely this task internally to grab data.

The complicated ops of scraping is running headless browsers, IP ranges, bot bypass, filling captchas, observability and updating selectors, etc. There are a ton of SaaS services that do that part for you.

2 comments

Agreed there are several complexities but not sure which ‘this’ you mean - specifically updating selectors is one of the areas I had in mind earlier..
There was one I remember out of UF/FSU called Intoli that seems to have pivoted into consulting.