Hacker News new | ask | show | jobs
by jvvlimme 5049 days ago
Java is suited and even powers some powerful crawlers like Heritrix (archive.org) and Nutch (Apache foundation).

That being said, it doesn't really matter what language you write your crawler in: its performance will much sooner be influenced by other aspects (network latency, storage, etc) than the language you choose.

So pick the language you're most comfortable with for crawling and offload the data processing to a lower level language that is better sooted for that task.