Hacker News new | ask | show | jobs
by alexey-salmin 994 days ago
Many website owners do explicitly block Bing and all other crawling bots in their right minds.

If you have a wide low-traffic website then bots of all sorts will make a majority of your traffic and subsequently a majority of your AWS costs.

If you see money spent on search engine X indexing and very few users incoming from that search engine it's a rational decision to block it. Or ask for money (that actually happens).

Overall it's a systematic problem with building a search competitor: 1) Your costs are largely proportional to the size of your index 2) Your income is proportional to your userbase 3) You need a huge index to be competitive even if you don't have any users yet

So, very hard to bootstrap even when you exclude all other advantages of the existing monopoly like browser-based distribution.

Microsoft has the money to beat the indexing problem so they argue about distribution in court but all the small players can't even get to that level of failure.

1 comments

> Many website owners do explicitly block Bing and all other crawling bots in their right minds.

If by "many" you mean "less than 0.1%". Nearly all sites want traffic, which search engines provide.

And you're moving the goalposts from the commenter I was responding to.

Nobody disputes it takes a large capital investment to create an index in the first place. But this is the business world -- that's what investors are for.

But the idea that websites provide their public content to Google and leave other crawlers with no way to access it is simply untrue. That is not a factor hindering competition.

> Nearly all sites want traffic, which search engines provide.

You don't seem to grasp the problem of indexing. If search engine fetches 100M pages but only brings 100 users it's a net loss for the website because of server costs. This means that a marginal player cannot index a big website.

I'm aware of at least one website from top20 that actively blocked crawlers other than Google citing this as a reason. And this website has tons of high-quality ugc that ranks at the top on many nontrivial queries. A huge blow to search quality when absent.

Having said that, it's true that for big search engines the issue is mostly distribution. However for small players it's distribution AND indexing and in the end they have to resort to buying the search results from big players.