| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ampersandy 1665 days ago
	Requiring users to know what sites they want in advance somewhat defeats the purpose of a search engine, no?

3 comments

robbomacrae 1665 days ago

Not at all. You only have to fail the first request. It is an approach I took with my own attempt at a search engine way back! In fact I know personally that there is at least one patent out there that suggests initial 1st time request users being asked to provide the appropriate response as an efficient way to teach systems for future users.

Obviously failing first requests isn't ideal but for popular requests it quickly becomes insignificant. Wikipedia might (if they don't already) want to make a similar suggestion for users to contribute when finding a low content/missing page.

link

lowwave 1665 days ago

> Obviously failing first requests isn't ideal but for popular requests it quickly becomes insignificant.

The first request can also be called asynchronously, and display a message to the user that it is 'processing....'.

link

ma2rten 1664 days ago

More often than not I have an idea which site a result might be on when I issue a query:

If I search for a news event it's a news site.

If I search an error message, I know the result is going to likely be stackoverflow, github issues or the forum of the library.

etc.

I don't think this strategy will get you all the way there, but it could be combined with other ways of curating sites to crawl.

link

convolvatron 1665 days ago

since sites are so desperate to be indexed, doesn't it seem better to put the onus on them to announce themselves? it would be great if dns registries publshed public keys .. maybe they do in newer schemes?

link

ma2rten 1664 days ago

That works once your search engine is more widely used, but not a lot of sites are going to register with a niche search engines. Many users on the other hand really want a search engine like this and would be willing to invest some time.

link

fragmede 1664 days ago

Certificate Transparency (CT) Logs are this.

link