Hacker News new | ask | show | jobs
by Xeoncross 3402 days ago
Javascript is the language to build a smaller crawler in because web pages run Javascript and you crawler needs to also. A C++ or Golang crawler will be a little faster and use a lot less memory - but you have to compile in webkit and do a bunch of hacky stuff to run the pages Javascript.

On the other hand, if you are building a massive crawler you want to split the crawling and parsing into two separate functions and do the network I/O in golang/c and do the parsing with a Javascript headless browser like phantom.

I don't really see any reason to use python unless you haven't learned golang (the world's biggest crawler's own language).

1 comments

No one else has mentioned it, but evaluating javascript on random webpages is something that one would need to be deeply careful about.
Care to elaborate?