Even taking into account the drop in prices on AWS.
Also, if you take a quick look at companies that provide such services the prices are orders of magnitude higher than deusu's costs.
Deusu's crawl servers are located at https://www.hosteurope.de/en/Server/Root-Server/ while the website points to his home broadband ISP. Two servers at his specs would be 200 Euro/month total, with 5x more bandwidth than he currently uses. I'd say that's much cheaper that AWS. Of course crawl companies charge more: they run a business, pay system administrators, have more backup and redundancy.
I'm not sure how he manages to crawl with this speed using such low amount of resources.
We did a benchmark on Nutch and couldn't really pass the 10-14 M(B)ps on a $1200/month machine. Even though we hired a professional to optimize the setup. The same is roughly true about Heritrix.
Just wondering if there is something missing in his setup, such as domain/ip rate limiting.
The problem is the huge contrast with https://www.quora.com/How-much-would-it-cost-to-crawl-1-bill...
Even taking into account the drop in prices on AWS. Also, if you take a quick look at companies that provide such services the prices are orders of magnitude higher than deusu's costs.