Hacker News new | ask | show | jobs
by pooriaazimi 5063 days ago
Please, do release it! I'm (or was, decided to go with Apache Nutch for the time being) in the process of creating a similar crawler (with almost the exact same "technologies" you mentioned). It would save me a lot of time and we might be able to help with fixing bug and adding features...
1 comments

Ok, I'll work then at creating a documentation and adding some tests. The project was written in coffeescript and someone only needs to extend a class and implement 2 methods and a list of starting urls. Using node cluster and concurrent connections I think it can scale very well. I introduced promises (taken from Jquery Deffered) in case someone wanted the writing to DB to be synchronous.

IMO, using kue was a success because it also offers a web interface where you can check the progress and restart/check failed jobs.

Great - I'll be looking forward to it. What's your GitHub username (if you intend to publish it there) so I can follow you to be notified when it's released?