Hacker News new | ask | show | jobs
by freeslave 5050 days ago
Nothing wrong with using Java and there is something to be said for using the language you are most productive in. But if you are thinking of building a web crawler in Java, I would recommend taking a look at the Heritrix project: https://webarchive.jira.com/wiki/display/Heritrix/Heritrix It's robust, open source and easily extensible. Might be easier to write a custom module for it than to roll your own web crawler.