| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jackienotchan 478 days ago
	AI agents have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN. Do you have any built-in features that address these issues?

2 comments

MagMueller 477 days ago

Yes, some hosting services have experienced a 100%-1000% increase in hosting costs.

On most platforms, browser use only requires the interactive elements, which we extract, and does not need images or videos. We have not yet implemented this optimization, but it will reduce costs for both parties.

Our goal is to abstract backend functionality from webpages. We could cache this, and only update the cache if eTags change.

Websites that really don't want us will come up with audio captchas and new creative methods.

Agents are different from bots. Agents are intended as a direct user clone and could also bring revenue to websites.

link

erellsworth 477 days ago

>Websites that really don't want us will come up with audio captchas and new creative methods.

Which you or other AIs will then figure a way around. You literally mention "extract data behind login walls" as one of your use cases so it sounds like you just don't give a shit about the websites you are impacting.

It's like saying, "If you really don't want me to break into your house and rifle through your stuff, you should just buy a more expensive security system."

link

gregpr07 477 days ago

imo if the website doesn't want us there the long term value is anyway not great (maybe exception is SERP apis or sth which live exlusively because google search api is brutally expensive).

> extract data behind login walls

We mean this more from a perspective of companies wanting it, but there is a login wall. For example (actual customer) - "I am a compliance company that has system from 2001 and interacting with it really painful. Let's use Browser Use to use the search bar, download data and report back to me".

I believe in the long run agents will have to pay for the data from website providers, and then the incentives are once again aligned.

link

erellsworth 477 days ago

> imo if the website doesn't want us there the long term value is anyway not great

Wat? You're saying if a website doesn't want your scraping their data then that data has low long-term value? Or are you saying something else because that makes no fucking sense.

link

gregpr07 477 days ago

Haha no, I am saying that if websites don’t want you there they will find a way to block you in the long run, so betting the product on extracting data from those websites is a bad business model (probably)

link

xena 477 days ago

It would be really nice if you made some easy way for administrators to tell that a client is using browser use so they can administratively block that tool. I mean, unless you want to pay for the infrastructure improvements to the websites your product is assaulting.

link

deadfece 475 days ago

In my experience these web agents are relatively expensive to run and are very slow. Admittedly I don’t browse HN frequently but I’d be interested to read some of these agent abuse stories, if any stand out to you. (I’ve been googling for ai agent website abuse stories and not finding anything so far)

link