Hacker News new | ask | show | jobs
by melted 3770 days ago
Google is one of DDGs backends, and the highest quality one at that. DDG does not have its own crawlers or index. Nor could it: those things cost hundreds of millions of dollars per year to run and maintain.

[Edit: apparently this is not true. DDG does not use Google, though it does use Bing and some other broad and narrow coverage search engines]

2 comments

DuckDuckGo in no way uses Google. We do use other sources, but Google isn't one of them.
Looks like my information is either out of date or was incorrect all along. Bing seems to be the primary source. They seem to also have a crawler now, though it's not clear how much it really covers. Very cool. Slow and steady wins the race.
I posted this elsewhere, but I didn't get an answer. Since you work you work for DDG, you might know.

How long would duckduckgo have to grow at its current rate to become an actual blip on Google's radar? On the one hand, you're small. On the other hand, keep up a fast growth rate long enough and you get bigger faster than people's intuitions' expect.

https://duckduckgo.com/traffic.html

> Nor could it: those things cost hundreds of millions of dollars per year to run and maintain.

I vehemently disagree. The early web crawlers and indexes did not cost hundreds of millions of dollars to run. Granted, there is significantly less web results than there are today, but the cost you're referring too is the entirety of google's servers. That price tag also includes the cost to host the web traffic of being the number 1 website in the world. You're talking a price tag which is indicative of a final product.

A new search engine would not have those costs initially, and if managed properly from the very beginning, would be able to scale and cover their bills, remaining profitable up until reaching (and hypothetically) replacing google.

Microsoft spends over $1B/yr on maintaining and improving Bing. So yeah, search does cost a fortune if you want to do it reasonably well without relying on others for results. Not only do you have to crawl the web (with different frequencies depending on predicted frequency of content update, etc), you then have to rebuild the index continuously, with reasonable latency. To do all of that, you have to have indexing infrastructure, which in turns requires storage infrastructure, high performanc data processing infrastructure (Hadoop ain't gonna do it, ask Yahoo as to why), which in turn requires high performance networking (or your mapreduce-like workloads will collapse under their own weight), your own datacenter, and your own army of machine learning researchers, systems engineers, hardware engineers, networking engineers, devops, quality engineers (people who improve your scoring function), human eval, etc, etc, etc. If anything, $1B/yr seems to be quite low.
> Microsoft spends over $1B/yr on maintaining and improving Bing. So yeah, search does cost a fortune

I doubt that figure, but even if it is true, you're still talking about a final product (not a new start up). It's a figure that includes things like marketing and insane web traffic. These are things that a new search engine would not have initially.

I'm not saying running a search engine is cheap. It's not. But the implication is that it's like the pharmaceutical industry where you need billions just to get in the game. It's not like that at all. In fact, there have been a few search engines which have tried to get in the game. Search engines like Cuil (anyone remember them?). They may have failed but that doesn't mean it's impossible. And it certainly doesn't take hundreds of millions of dollars to get started.

Insane web traffic doesn't actually cost you much. Au contraire, my friend, it brings in the dough. What costs you money is the long tail, those deep, obscure searches which you have to answer to be perceived a worthwhile competitor to e.g. Google. To answer the long tail yourself you must have it in the index, which makes for a _very_ big index. And then you also have to figure out how to get the relevant results out of there in a hundred milliseconds (or 50ms if you're Google, and you have instant search), and update all that goodness to keep it fresh, so that it doesn't take three months for new pages to appear. If you think this wouldn't cost much to a startup, you don't know much about search.

And you don't need to doubt that figure, Microsoft discloses it in its financial results.