Hacker News new | ask | show | jobs
by havnagiggle 984 days ago
Got any reports/statistics to back this up? I highly doubt websites are not wanting major search engines to index them. AFAIK it's been standard practice to use `User-agent: *` for a long time. There are other anti-crawling measures because the bad crawlers are not going to respect your robots.txt.
1 comments

It's not necessarily a lot of sites that block non-popular bots - but often it's big sites (i.e. content-centric sites such as Social Media). Think Yelp, Twitter, LinkedIn, Instagram, etc.

That can add up to a serious percentage of the web.