Hacker News new | ask | show | jobs
by forlorn 4164 days ago
My recipe is to use Typhoeus (https://github.com/typhoeus/typhoeus) + Nokogiri. I have tried lots of different options including EventMachine with em-http-request and reactor loop and concurrent-ruby (both a re very poorly documented)

Typhoeus has a built-in concurrency mechanism with callbacks with specified number of concurrent http requests. You just create a hydra object, create the first request object with URL and a callback (you have to check errors like 404 yourself) where you extract another URLs from the page and push them to hydra again with the same on another callback.

1 comments

Just said this myself. I love Typhoeus, though I can't spell it 9/10 times.