|
I follow some indie hackers online who are in the scraping space, such as BrowserBear and Scrapingbee, I wonder how they will fare with something like this. The only solace is that this is nondeterministic, but perhaps you can simply ask the API to create Python or JS code that is deterministic, instead. More generally, I wonder how a lot of smaller startups will fare once OpenAI subsumes their product. Those who are running a product that's a thin wrapper on top of ChatGPT or the GPT API will find themselves at a loss once OpenAI opens up the capability to everyone. Perhaps SaaS with minor changes from the competition really were a zero-interest-rate phenomenon. This is why it's important to have a moat. For example, I'm building a product that has some AI features (open source email (IMAP and OAuth2) / calendar API), but it would work just fine even without any of the AI parts, because the fundamental benefit is still useful for the end user. It's similar to Notion, people will still use Notion to organize their thoughts and documents even without their Notion AI feature. Build products, not features. If you think you are the one selling pickaxes during the AI gold rush, you're mistaken; it's OpenAI who's selling the pickaxes (their API) to you who are actually the ones panning for gold (finding AI products to sell) instead. |
To put it somewhat in context, the two types of scrapers currently are traditional http client based or headless browser based. The headless browsers being for more advanced sites, SPAs where there isn't any server side rendering.
However headless browser scraping is in the order of 10-100x more time consuming and resource intensive, even with careful blocking of unneeded resources (images, css). Wherever possible you want to avoid headless scraping. LLMs are going to be even slower than that.
Fortunately most sites that were client side rendering only are moving back towards have a server renderer, and they often even have a JSON blob of template context in the html for hydration. Makes your job much easier!