Hacker News new | ask | show | jobs
by 1010101111001 5318 days ago
If you want to build a search engine, there's no need to crawl the web. Just crawl Google. That's what Microsoft did.

The inequities of crawling are in my opinion a huge issue with the web that receives relatively little discussion.

If a majority of websites were to ban Googlebot, Google would be brought to its knees. Slowly.

Their index would go stale.

I like Google. But I do think crawling, who gets to do it and who does not is an underappreciated issue. Google is very lucky.

2 comments

I'm guessing the reason this was downvoted was the first paragraph. It was meant as sarcasm. Bing fans on HN I did not expect.
How are we blocking Googlebot? If we're using robots.txt then they can simply ignore it. Googlebot can begin to identify itself differently. There are a million ways to get around a Googlebot ban and I wouldn't be the guy who thinks he's smarter than Google. You'll lose that one. They'll find a way.

But anyway, this isnt really relevant. Can you tie it in for us?

Blocking is not done via robots.txt. It would more likely be IP-based.

Impossible to block Google? Probably true. But only because they have been allowed to grow so large as to be indefeasible. And the reason they've been able to do that is because of what the commenter said: websites allowed them to crawl, fast and hard, year after year. This is not true for all bots.

I'm not sure how this is relevant to the article and the specific issue. I don't disgree with what Google is doing in this case. And I understand why and where Google is headed.

The issue of who is allowed to crawl and who is not is something the commenter raised. It's a huge issue that people take for granted, in my opinion.