:/ Common Crawl archives robots.txt and indicates that the file at wiki.diasporafoundation.org was unchanged in November and December from what it is now. Unchanged from September, in fact.
they ingested it twice since I deployed it. they still crawl those URLs - and I'm sure they'll continue to do so - as others in that thread have confirmed exactly the same. I'll be traveling for the next couple of days, but I'll check the logs again when I'm back.
of course, I'll still see accessed from them, as most others in this thread do, too, even if they block them via robots.txt. but of course, that won't stop you from continuing to claim that "I lied". which, fine. you do you. luckily for me, there are enough responses from other people running medium-sized web stuffs with exactly the same observations, so I don't really care.
Here's something for the next time you want to "expose" a phony: before linking me to your investigative source, ask for exact date-stamps when I made changes to the robots.txt and what I did, as well as when I blocked IPs. I could have told you those exactly, because all those changes are tracked in a git repo. If you asked me first, I could have answered you with the precise dates, and you would have realized that your whole theory makes absolutely no sense. Of course, that entire approach is mood now, because I'm not an idiot and I know when commoncrawl crawls, so I could easily adjust my response to their crawling dates, and you would of course claim I did.
So I'll just wear my "certified-phony-by-orangesite-user" badge with pride.
Gentleman’s bet. If you can accurately predict the day of four of the next six months of commoncrawls crawl, I’ll donate $500 to the charity of your choice. Fail to, donate $100 to the charity of my choice.
(and it's not like they only check robots.txt once a month or so. https://stuff.overengineer.dev/stash/2024-12-30-dfwiki-opena...)