| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by edmundsauto 1957 days ago

For these sites, I crawl using a JS powered engine, and just save the relevant page content to disk.

Then I can craft my regex/selectors/etc., once I have the data stored locally.

This helps if you get caught and shut down - it won't turn off your development effort, and you can create a separate task to proxy requests.