Hacker News new | ask | show | jobs
by fireant 852 days ago
I've found myself writing the same session/proxy/rate limiting/header faking management code over and over for my scrapers. I've extracted it into it's own service that runs in docker and acts as a MITM proxy between you and target. It is client language agnostic, so you can write scrapers in python, node or whatever and still have great performance.

Highly recommend this approach, it allows you to separate infrastructure code, that gets highly complex as you need more requests, from actual spider/parser code that is usually pretty straightforward and project specific.

https://github.com/jkelin/forward-proxy-manager

1 comments

This is great, was totally in the back of my mind as a next step.