Hacker News new | ask | show | jobs
by webtechgal 3570 days ago
It would be great if you can share (at least) some information about the kind of hosting setup you're using, how much of bandwidth and how long it took to crawl and index the 2B pages.
1 comments

4 servers in total.

2 are used for crawling, index-building and raw-data storage. Quadcore, 32gb RAM, 4tb HDD and 1gbit/s internet connection on each of these. They are rented and in a big data-center. Crawling uses "only" about 200-250mbit/s of bandwidth.

2 servers for webserver and queries. Quadcore, 32gb RAM. One with 2x512gb SSD, the other with only 1x512gb SSD. These servers are here at home. I have cable internet with 200mbit/s down, 20mbit/s up. Static IPs obviously.

A full crawl currently takes about 3 months.