Hacker News new | ask | show | jobs
by Symbiote 6 days ago
I have observed the same from Meta's crawler.

  User-agent: *
  Disallow: /
on e.g. our preproduction site, Meta is the only big-tech crawler that accesses it, at least with an honest user agent. (Meta also accesses disallowed paths on the production site.)
1 comments

I'm not defending meta here, but I should mention that meta also uses crawlers to visit pages when someone send a link through their services.

   User-agent: *
can be ignored by bots, but if they ignore the disallow rule for their own UA, they can easily be blocked by network AS.