Hacker News new | ask | show | jobs
by outloudvi 428 days ago
Vercel has a fairly generous free quota and a non-negligible high pricing scheme - I think people still remember https://service-markup.vercel.app/ .

For the crawl problem, I want to wait and see whether robots.txt is proved enough to stop GenAI bots from crawling since I confidently believe these GenAI companies are too "well-behaved" to respect robots.txt.

2 comments

This is my experience with AI bots. This is my robots.txt:

User-agent: * Crawl-Delay: 20

Clear enough. Google, Bing and others respect the limits, and while about half my traffic are bots, they never DoS the site.

When a very well known AI bot crawled my site in august, they fired up everything: fail2ban put them temporarily in jail multiple times, the nginx request limit per ip was serving 426 and 444 to more than half their requests (but they kept hammering the same Urls), and some human users contacted me complaining about the site going 503. I had to block the bot IPs at the firewall. They ignore (if they even read) the robots.txt.

Nope they have been ignoring robots.txt since the start. There are multiple posts all over the internet.