|
|
|
|
|
by dchuk
21 days ago
|
|
I presume you’re politely asking in order to block? Which is fine, I get it. On my phone right now but can update later. I do want to ask though (and I should make this clear in a FAQ or something): the way I check RSS feeds uses adaptive scheduling, so I intentionally don’t check feeds of sites too rapidly. Then the summarization is based on the full article content but I never render that full content on the site (to avoid traffic hijacking concerns). Given that: what’s the concern? |
|
That said, I'm not necessarily planning to immediately block your crawlers, I intend to just add them to a list I maintain for personal reference. I'm mostly interested in correlating the crawling traffic that I see with various sources, I have been gathering data about crawling activity and sources that I display on an embedded map on my site. I have caddy annotate traffic with a header indicating what the crawler is, and if the fleet behaves nicely then they don't get added to the blocklist.