Hacker News new | ask | show | jobs
by jonathanmh 3704 days ago
Hi Jaruzel, the crawler is live :D you can see in the bottom which page it is crawling right now.

It's basically scraping all links on every page it hits and tests the headers if they are containing the clacks value.

I added the form so people can submit pages to speed up the development of the list, even though I believe eventually the crawler would get to their pages :D

1 comments

Cool! What software are you using for the crawler?
I built a not very advanced one with node.js and mongodb :)

Mainly in use in the crawler: * request * cheerio