I find it unethical for a website robots.txt to allow-list particular search engines and ban all others. Essentially you are colluding with established search providers.
Not necessarily, I have a website with 95% (maybe even more) of the traffic generated by crawlers. If some of them are behaving badly, it is fair to exclude them with my robots.txt.
But of course, the ones behaving badly tend to not respect the robots.txt, so you end up banning the IP or IP block.
And here, I am a nice guy, the crawler must really be a piece of crap for me to start to block.
This rather bluntly runs up against the fact that permitting crawling is an expense the web operator is taking on, ergo, receiving that content is by definition a privilege not a right.
I don't know if that's a reply at me or a general remark, but yes, I never understood why you'd include a few big names and ban the rest for example. That's just screaming for anticompetitiveness. I don't know if my mention of robots.txt sounded like I do this, but I do not
But of course, the ones behaving badly tend to not respect the robots.txt, so you end up banning the IP or IP block.
And here, I am a nice guy, the crawler must really be a piece of crap for me to start to block.