| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lern_too_spel 1050 days ago
	When will Brave Search launch a crawler update that lets me specifically block its crawler in robots.txt like every other search engine supports?

1 comments

blacksmith_tb 1050 days ago

I see they say "if a domain or page is not crawlable by any search engine (it has a noindex tag), or if it is not crawlable by googlebot, then Brave Search’s bot will not crawl it either."

1: https://brave.com/search/api/

link

cpeterso 1050 days ago

Does the Brave crawler send the Googlebot or regular Chrome User-Agent string? If it sends something different than the standard Googlebot User-Agent string, you could dynamically serve a robots.txt that blocks Googlebot to every client besides Googlebot. OTOH, I've read that the Google crawler sometimes users the regular Chrome User-Agent string and penalizes sites that return different content to Googlebot and Chrome.

link

lern_too_spel 1050 days ago

What if I want googlebot to crawl it but not bravebot? Every other search engine lets me block its crawler specifically. Only Brave has this shady policy.

link

hightrix 1050 days ago

> What if I want googlebot to crawl it but not bravebot?

Then you need to gate your content such that it is not available openly to the public.

This falls inline with many objections to Google's WEI. If you host content openly and allow access freely, then don't be surprised when people access it at will and use it for free.

link

lern_too_spel 1049 days ago

Then why does bravebot obey robots.txt at all? It does, and it will respect blocks of ggoglebot, but it won't allow blocking just it or just googlebot.

link

blacksmith_tb 1050 days ago

Hmm, I agree it's odd, but 'shady' seems to attribute malice to what could just be stupidity?

link

d_theorist 1050 days ago

Or probably just an innocent oversight? I imagine they might have taken this decision early on when they were far too small for anybody to even think of not wanting to be crawled by them, and just never revisited the decision.

link

skilled 1050 days ago

It’s not so innocent,

https://stackdiary.com/brave-selling-copyrighted-data-for-ai...

link

lern_too_spel 1050 days ago

Brave has a track record of malice driven by stupidity.

link

gettodachoppa 1049 days ago

Youu want the monopolistic tech giant to crawl you but not a small privacy-focused company? What possible justification could you have for this attitude?

link

lern_too_spel 1049 days ago

If you want your robots.txt to tell bravebot to crawl your site but not googlebot, Brave puts you in the same position. You can't.

What possible justification could Brave have for this policy?

link

bravetraveler 1050 days ago

I'm conflicted - I see your point and agree; though I appreciate that by using methods of others... we don't end up with more

Loosely related XKCD: https://xkcd.com/927/

link