Hacker News new | ask | show | jobs
by jeroenhd 665 days ago
I recently found out that Bytedance was scraping a website of mine over and over again. I don't care about their stupid AI crawler scanning my cheapo server, but they were hitting the same files from different IP addresses, all from the same /56 China Telecom subnet.

I added a firewall rule to block the subnet and that seems to have worked. Earlier attempts involving robots.txt failed and my logs still got spammed by all the HTTPS requests when I blocked the bots in Nginx.

I don't understand how you could write a scraper like that and not notice that you're downloading the same files over and over again.