| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Trung0246 313 days ago
	One way to easily bypass is to let external services fetching robots.txt (archive.org, GitHub actions, etc...) to cache it and either expose through separate apis/webhook/manual download to the actual scrape server. robots txt file size is usually small and would not alert external services.