Hacker News new | ask | show | jobs
by lern_too_spel 1050 days ago
When will Brave Search launch a crawler update that lets me specifically block its crawler in robots.txt like every other search engine supports?
1 comments

I see they say "if a domain or page is not crawlable by any search engine (it has a noindex tag), or if it is not crawlable by googlebot, then Brave Search’s bot will not crawl it either."

1: https://brave.com/search/api/

Does the Brave crawler send the Googlebot or regular Chrome User-Agent string? If it sends something different than the standard Googlebot User-Agent string, you could dynamically serve a robots.txt that blocks Googlebot to every client besides Googlebot. OTOH, I've read that the Google crawler sometimes users the regular Chrome User-Agent string and penalizes sites that return different content to Googlebot and Chrome.
What if I want googlebot to crawl it but not bravebot? Every other search engine lets me block its crawler specifically. Only Brave has this shady policy.
> What if I want googlebot to crawl it but not bravebot?

Then you need to gate your content such that it is not available openly to the public.

This falls inline with many objections to Google's WEI. If you host content openly and allow access freely, then don't be surprised when people access it at will and use it for free.

Then why does bravebot obey robots.txt at all? It does, and it will respect blocks of ggoglebot, but it won't allow blocking just it or just googlebot.
Hmm, I agree it's odd, but 'shady' seems to attribute malice to what could just be stupidity?
Or probably just an innocent oversight? I imagine they might have taken this decision early on when they were far too small for anybody to even think of not wanting to be crawled by them, and just never revisited the decision.
Brave has a track record of malice driven by stupidity.
Youu want the monopolistic tech giant to crawl you but not a small privacy-focused company? What possible justification could you have for this attitude?
If you want your robots.txt to tell bravebot to crawl your site but not googlebot, Brave puts you in the same position. You can't.

What possible justification could Brave have for this policy?

I'm conflicted - I see your point and agree; though I appreciate that by using methods of others... we don't end up with more

Loosely related XKCD: https://xkcd.com/927/