A well written scraper looks like normal traffic in the same way that a well written pseudo random number generator looks like a random number generator. It'll fool your eye but not statistical analysis.
Think about the goal of a scraper, it needs to actually walk through all the content. That doesn't look like a normal user at all. An individual request might look ok, but in aggregate the pattern of a robot pops out.
So, would that detection mechanism be able to deal with a number of coordinated scrapers rotating through lists of proxies, using different User-Agent strings, making requests with (pseudo-)random delays between requests?
Think about the goal of a scraper, it needs to actually walk through all the content. That doesn't look like a normal user at all. An individual request might look ok, but in aggregate the pattern of a robot pops out.
https://www.usenix.org/conference/usenixsecurity12/pubcrawl-...