Hacker News new | ask | show | jobs
by mcpar-land 294 days ago
My worst offender for scraping one of my sites was Anthropic. I deployed an ai tar pit (https://news.ycombinator.com/item?id=42725147) to see what it would do it with it, and Anthropic's crawler kept scraping it for weeks. I calculated the logs and I think I wasted nearly a year of their time in total, because they were crawling in parallel. Other scrapers weren't so persistent.
3 comments

For me it was OpenAI. GTPBot hammered my honeypot with 0.87 requests per second for about 5 weeks. Other crawlers only made up 2% of the traffic. 1.8 million requests, 4 GiB of traffic. Then it just abruptly stopped for whatever reason.
Tar pits and serve fake but legitimate looking content. Poison it.
That's hilarious. I need to set up one of these myself