Hacker News new | ask | show | jobs
by luckylion 242 days ago
If they even are AI crawlers. Could be just as well some exploit-scanners that are searching for endpoints they'd try to exploit. That wouldn't require storing the content, only the links.
1 comments

If you look at the pages which are hit and how many pages are hit by any one address in a given period of time it's pretty easy to identify features which are reliable proxies for e.g. exploit scanners, trawlers, agents. I publish a feed of what's being hit on my servers, contact me for details (you need to be able to make DNS queries to a particular server directed at a domain which is not reachable from ICANN's root).