Hacker News new | ask | show | jobs
by ampersandy 1665 days ago
Requiring users to know what sites they want in advance somewhat defeats the purpose of a search engine, no?
3 comments

Not at all. You only have to fail the first request. It is an approach I took with my own attempt at a search engine way back! In fact I know personally that there is at least one patent out there that suggests initial 1st time request users being asked to provide the appropriate response as an efficient way to teach systems for future users.

Obviously failing first requests isn't ideal but for popular requests it quickly becomes insignificant. Wikipedia might (if they don't already) want to make a similar suggestion for users to contribute when finding a low content/missing page.

> Obviously failing first requests isn't ideal but for popular requests it quickly becomes insignificant.

The first request can also be called asynchronously, and display a message to the user that it is 'processing....'.

More often than not I have an idea which site a result might be on when I issue a query:

If I search for a news event it's a news site.

If I search an error message, I know the result is going to likely be stackoverflow, github issues or the forum of the library.

etc.

I don't think this strategy will get you all the way there, but it could be combined with other ways of curating sites to crawl.

since sites are so desperate to be indexed, doesn't it seem better to put the onus on them to announce themselves? it would be great if dns registries publshed public keys .. maybe they do in newer schemes?
That works once your search engine is more widely used, but not a lot of sites are going to register with a niche search engines. Many users on the other hand really want a search engine like this and would be willing to invest some time.
Certificate Transparency (CT) Logs are this.