| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ssubu 2376 days ago

[Disclaimer: work at Cliqz] We do not crawl the web in the traditional sense, our search was bootstrapped on query logs. It is the very reason we could succeed in building a search engine with minimal resources, in comparison to our competitors.We have written about this in a lot more detail here :

How we collect data : https://www.0x65.dev/blog/2019-12-03/human-web-collecting-da...

How we build the search using this data: https://www.0x65.dev/blog/2019-12-06/building-a-search-engin...

Feel free to peruse these posts and ask questions!

2 comments

petra 2376 days ago

What about tools for power searchers ? Google have abandoned us.

Are you planning to create strong tools in that area ?

For example: custom search engines, the NEAR operator, limit search to sites that don't update that often or aren't linked to very strong sites(against SEO), etc

link

ThePhysicist 2376 days ago

Does really all of your data come from the human web project or do you also buy clickstream data from data brokers?

link

ssubu 2376 days ago

We speak about this is much more detail in this post (https://0x65.dev/blog/2019-12-05/a-new-search-engine.html), but in short, we prototyped our search initially with data we purchased from data-brokers. Once the concept was proven and HumanWeb was deployed (2015/2016), we rely only on our data.

link