|
|
|
|
|
by beejiu
3402 days ago
|
|
Not too long ago I built a small webcrawler using Node.js, figuring that crawlers spend most of their time waiting (e.g. downloading) and therefore Node.js would be well suited. At the time I found crawlers written in Python were fairly slow, which is not a surprise. It is backed by Redis and is pretty fast even on a single process. https://github.com/brendonboshell/supercrawler |
|
On the other hand, if you are building a massive crawler you want to split the crawling and parsing into two separate functions and do the network I/O in golang/c and do the parsing with a Javascript headless browser like phantom.
I don't really see any reason to use python unless you haven't learned golang (the world's biggest crawler's own language).