Hacker News new | ask | show | jobs
by Trung0246 313 days ago
One way to easily bypass is to let external services fetching robots.txt (archive.org, GitHub actions, etc...) to cache it and either expose through separate apis/webhook/manual download to the actual scrape server.

robots txt file size is usually small and would not alert external services.