| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tconaugh 3229 days ago

They use distributed web crawlers to crawl 100s of billions of web pages. Probably one of the following options:

1) Built their own crawlers.

2) Using an Apache Nutch/Heritrix cluster in a colo facility.

3) Use 3rd party services like mixnode.