| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ItsBob 230 days ago

> It's interesting to study, right?

Definitely! I wasn't experiencing any issues, hell it wasn't even for public consumption at that time so no great loss to me but I found a few things fascinating (and somewhat stupid!) about it:

1. The sheer number of automated requests to scrape my content

2. That a massive number of the bots openly had "bot" or some derivative in the user agent and they were accessing a page I'd explicitly denied! :D

3. That an equally large number were faking their user agents to look like regular users and still hitting a page that a regular user couldn't possibly ever hit!

Something I did notice but it was towards the end and I didn't pursue it (I should log it better the next time for analysis!) was that the endpoint was dynamically generated and only existed in the robots.txt for a short time but there were bots I caught later on, long after that auto-generated page was created (and after the IP was banned) that still went for that same page: clearly the same entities!

My spidey senses are tingling. Next time, I'm going to log the shit out of these requests and publish as much as I can for others to analyse and dissect... might be interesting.