Hacker News new | ask | show | jobs
by tconaugh 3229 days ago
They use distributed web crawlers to crawl 100s of billions of web pages. Probably one of the following options:

1) Built their own crawlers.

2) Using an Apache Nutch/Heritrix cluster in a colo facility.

3) Use 3rd party services like mixnode.