|
|
|
|
|
by Trung0246
313 days ago
|
|
One way to easily bypass is to let external services fetching robots.txt (archive.org, GitHub actions, etc...) to cache it and either expose through separate apis/webhook/manual download to the actual scrape server. robots txt file size is usually small and would not alert external services. |
|