Hacker News new | ask | show | jobs
by reconnecting 6 days ago
And assume you have

User-agent: meta-externalagent

Disallow: /

2 comments

I have observed the same from Meta's crawler.

  User-agent: *
  Disallow: /
on e.g. our preproduction site, Meta is the only big-tech crawler that accesses it, at least with an honest user agent. (Meta also accesses disallowed paths on the production site.)
I'm not defending meta here, but I should mention that meta also uses crawlers to visit pages when someone send a link through their services.

   User-agent: *
can be ignored by bots, but if they ignore the disallow rule for their own UA, they can easily be blocked by network AS.
They don't obey *, they don't get their own entry. I'd rather just poison their data, it's a well known behavior from them.

https://www.reddit.com/r/webdev/comments/1sdzd1q/metas_ai_cr...