|
|
|
|
|
by Ian_Kerins
99 days ago
|
|
A lot of the discussion around the /crawl endpoint seems to miss a key detail in the docs. The crawler explicitly identifies itself as a bot, respects robots.txt, and does not bypass CAPTCHAs, WAF rules, or Cloudflare Bot Management. So technically it’s a nice managed crawling system, but in practice it only works on sites that already allow bots to crawl them. For many real-world data extraction use cases, the problem isn’t crawling infrastructure, it’s dealing with sites that actively block bots. In those cases you still need traditional scraping approaches. |
|