Hacker News new | ask | show | jobs
by Sven7 4787 days ago
Your comment highlights the problem. When people think search they think Google. Search is bigger than that and Google is in a way through its success and utility, limiting people's imagination when they think about search.

Just look at their menu bar...images, videos, flights, blogs, shopping, books, patents, apps.

Is that it?

Not to mention random. So we just sit around waiting for some benevolent god in Mountain View to say, you know what now let the mortals have...recipes.

If they want to expand that list to the infinite domains it should be covering, it is never going to happen with the resources they have. They need to open the index to tap into its full potential.

1 comments

I doubt any "index" exists in the way the words indicates. It's likely highly custom for how it's accessed. What would an API for that look like? Do you want to just be able to do random regexes (that would be awesome...I miss code search)? Do you just want a disk sitting somewhere with all the internet on it so that you can run custom programs on it?

Identifying what a recipe looks like and then providing a search interface that can figure out which of the millions of variations of some soup recipe is what a person is looking for and is more authoritative than others (and not some blog spam with minor (but random) alterations, or written by an amateur with no business in the kitchen) is a hard problem. Crawling isn't really the hard part. It takes a lot of hardware and time, but then you have all this data...that's when the hard part starts.

I'm interested in what others think a useful "index" API would look like, though.

If you just want a disk sitting around with the Internet on it, check out Common Crawl (http://commoncrawl.org/).