| HN Mirror

I doubt any "index" exists in the way the words indicates. It's likely highly custom for how it's accessed. What would an API for that look like? Do you want to just be able to do random regexes (that would be awesome...I miss code search)? Do you just want a disk sitting somewhere with all the internet on it so that you can run custom programs on it?

Identifying what a recipe looks like and then providing a search interface that can figure out which of the millions of variations of some soup recipe is what a person is looking for and is more authoritative than others (and not some blog spam with minor (but random) alterations, or written by an amateur with no business in the kitchen) is a hard problem. Crawling isn't really the hard part. It takes a lot of hardware and time, but then you have all this data...that's when the hard part starts.

I'm interested in what others think a useful "index" API would look like, though.