|
|
|
|
|
by Pinbenterjamin
2525 days ago
|
|
According to the NDA with my company I can't reveal anything about the architecture beyond the fact that it is hosted locally on a homebuilt distributed system that randomly chooses from a pool of 120 residential IPs. We do have human emulation routines that helped avoid most detection, and that library is decoupled in such a way that we can edit behavior down to the individual site. Some sites are just so damn good and detecting us and I just don't get it. |
|
The countermeasure would be to have a bunch of humans use the websites in any way they want, totally undirected, then use the totality of that browsing to facilitate your scraping probabilistically. It would be less efficient, but very difficult to catch.