|
|
|
|
|
by freeslave
5050 days ago
|
|
Nothing wrong with using Java and there is something to be said for using the language you are most productive in. But if you are thinking of building a web crawler in Java, I would recommend taking a look at the Heritrix project: https://webarchive.jira.com/wiki/display/Heritrix/Heritrix
It's robust, open source and easily extensible. Might be easier to write a custom module for it than to roll your own web crawler. |
|