Hacker News new | ask | show | jobs
by chewxy 4931 days ago
Let's examine your motivation: why do you want to block said scrapers in the first place? SEO concerns (dupe content)?
1 comments

Mostly duplicate content & messing up my analytics (increased bounce rate, decreased time spend on page etc.)
Because it seems to be from selenium (from referer), it is triggering the JS too, we are using Google Analytics.
Why not exclude the EC2 ip range from analytics ?
What analytics are you using? I thought most would use javascript to avoid problems like this, and I would wager that the vast majority of bots don't bother executing javascript. You will always have legit bots hitting your site as well.
Selenium executes Javascript, as do other WebKit-based scrapers, like phantom.js.