Hacker News new | ask | show | jobs
by melted 3774 days ago
Microsoft spends over $1B/yr on maintaining and improving Bing. So yeah, search does cost a fortune if you want to do it reasonably well without relying on others for results. Not only do you have to crawl the web (with different frequencies depending on predicted frequency of content update, etc), you then have to rebuild the index continuously, with reasonable latency. To do all of that, you have to have indexing infrastructure, which in turns requires storage infrastructure, high performanc data processing infrastructure (Hadoop ain't gonna do it, ask Yahoo as to why), which in turn requires high performance networking (or your mapreduce-like workloads will collapse under their own weight), your own datacenter, and your own army of machine learning researchers, systems engineers, hardware engineers, networking engineers, devops, quality engineers (people who improve your scoring function), human eval, etc, etc, etc. If anything, $1B/yr seems to be quite low.
1 comments

> Microsoft spends over $1B/yr on maintaining and improving Bing. So yeah, search does cost a fortune

I doubt that figure, but even if it is true, you're still talking about a final product (not a new start up). It's a figure that includes things like marketing and insane web traffic. These are things that a new search engine would not have initially.

I'm not saying running a search engine is cheap. It's not. But the implication is that it's like the pharmaceutical industry where you need billions just to get in the game. It's not like that at all. In fact, there have been a few search engines which have tried to get in the game. Search engines like Cuil (anyone remember them?). They may have failed but that doesn't mean it's impossible. And it certainly doesn't take hundreds of millions of dollars to get started.

Insane web traffic doesn't actually cost you much. Au contraire, my friend, it brings in the dough. What costs you money is the long tail, those deep, obscure searches which you have to answer to be perceived a worthwhile competitor to e.g. Google. To answer the long tail yourself you must have it in the index, which makes for a _very_ big index. And then you also have to figure out how to get the relevant results out of there in a hundred milliseconds (or 50ms if you're Google, and you have instant search), and update all that goodness to keep it fresh, so that it doesn't take three months for new pages to appear. If you think this wouldn't cost much to a startup, you don't know much about search.

And you don't need to doubt that figure, Microsoft discloses it in its financial results.