Good article! I been doing scraping for the last 10 years and I've seen a lots of differents things to try to avoid us.
Also, I'm in the other side protecting websites to ban scrapers, so funny!
I'm in the same position for the first time (protecting against scraping) and honestly I'm kind of blind right now. Which is weird because of how much scraping I've done (okay not that much). Any tips or tricks or blogs you know of off the top of your head for protecting your site?
Virtually everything can be easily defeated. The only outfit I've consistently seen put up a good fight is Distil. They do it by acting a little like Cloudflare. They put their servers in front of your www facing endpoints and use ML to mine their global client traffic to identify bot signals (aided by some aggressive in-browser javascript fingerprinting).
Yeah, Distil is the first outfit I've encountered where they've got the model to make it really hard to reliably bypass. It comes down to "I can spend a significant amount of time trying to bypass this, and I would, but they would likely identify and block me again within a few weeks at most.", and it's not worth it when it's only part of what I need to do to scrap some data, and it's their entire job, and they can afford to hire multiple people.
The economics are in their favor, and I make it a point not to fight economics when I recognize them, it's rarely sustainable.
After the years, I've arrived at the conclusion that everything can be scrapped. What you have to do is try to put as many walls as you can. But if someone really wants to crawl your site, with the right knowledgement he will able to do it despite of all your walls.