|
|
|
|
|
by jackienotchan
478 days ago
|
|
AI agents have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN. Do you have any built-in features that address these issues? |
|
On most platforms, browser use only requires the interactive elements, which we extract, and does not need images or videos. We have not yet implemented this optimization, but it will reduce costs for both parties.
Our goal is to abstract backend functionality from webpages. We could cache this, and only update the cache if eTags change.
Websites that really don't want us will come up with audio captchas and new creative methods.
Agents are different from bots. Agents are intended as a direct user clone and could also bring revenue to websites.