|
|
|
|
|
by rb2k_
5653 days ago
|
|
Oh, node.js is definitely a great direction to go! One of my problems was that a lot of the "usual" libraries are written in a synchronous/blocking manner behind the scenes. This is something that the node.js ecosystem would probably solve right from the start. The downside of a relatively new library like httpClient is, that it is missing things like automatically following redirects. While this can be implemented in the crawler code, it complicates things. How big are the datasets that vertex.js/tokyo cabinet is able to handle for you? Node.js is on the list of things I'd like to play with a bit more (just like Scala, Erlang, graph databases, mirah, ...).
Is your crawler's source code available by any chance? |
|
Vertex.js can't quite keep up with TC as its written in javascript. However, it does let you batch writes into logical transactions, which you can use to get fairly high throughput.
The source isn't open as its fairly specific to my app, http://luciebot.com/. I'd be happy to chat about the details without releasing the source. richcollins@gmail.com / richcollins on freenode.