|
|
|
|
|
by averagewall
3073 days ago
|
|
You'd have to scrape slowly to mimic a real slow user. Maybe at that point you'd be cheaper to get Mechanical Turk to do it. That should solve IP rate limiting, captchas, and just about everything except the endless arms race. Why are so many people going directly to these same-formatted internal URLs without clicking through from random other places? So the site can change the internal URLs and break it all over again. |
|
Recap [1] does this to extract PACER court documents that are public domain, but access is restricted due to draconian public policy.
[1] https://free.law/recap/