| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by slashdev 2534 days ago
	Storage and bandwidth are cheaper than ever before, people scrape a billion pages for much more mundane purposes these days, even for academic papers. Having a full text index on that is more involved but hardly impossible. You're completely right that it's not at all Google's secret sauce. Bing has clearly indexed much more than that, plus invested a ton in actually returning good results from their index. And still nearly nobody cares. It's just not easy to make a better Google, and the people most likely to figure out how to do that already work there.

2 comments

aantix 2534 days ago

The Common Crawl corpus is already available and stored on S3 - so analyzing billions of web pages is literally already available with an AWS account and a simple map reduce job.

I'd actually advocate for making public an anonymized list of actual search queries.

Domain specific search engines could evolved based on the demand of what has already been searched for.

link

Sander_Marechal 2534 days ago

Anonymizong search queries is extremely hard, if not impossible. See https://en.wikipedia.org/wiki/AOL_search_data_leak for example.

link

ForHackernews 2534 days ago

> It's just not easy to make a better Google

It depends which sense of "better" you mean. It's nearly trivial to make an ethically superior search engine by just not building the spyware bits of Google.

It's difficult to make a search engine that's "better" along the dimensions of speed, profitability, etc.

link

slashdev 2534 days ago

That exists, it's called duck duck go, and even less people care about it than Bing. For the most part, people don't actually care about Google collecting their entire search history and combining it with their other data on you. We may live to regret that in a hypothetical future where the government turns more authoritarian and requisitions that data for evil.

link

eloff 2534 days ago

I made three statements. They're all true as far as I can see. Would the downvoters care to speak up?

link