Hacker News new | ask | show | jobs
by mtmail 3566 days ago
Deusu's crawl servers are located at https://www.hosteurope.de/en/Server/Root-Server/ while the website points to his home broadband ISP. Two servers at his specs would be 200 Euro/month total, with 5x more bandwidth than he currently uses. I'd say that's much cheaper that AWS. Of course crawl companies charge more: they run a business, pay system administrators, have more backup and redundancy.
1 comments

I'm not sure how he manages to crawl with this speed using such low amount of resources.

We did a benchmark on Nutch and couldn't really pass the 10-14 M(B)ps on a $1200/month machine. Even though we hired a professional to optimize the setup. The same is roughly true about Heritrix.

Just wondering if there is something missing in his setup, such as domain/ip rate limiting.

You can check his source if you are curious how it works ;)