| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by reconnecting 6 days ago

And assume you have

User-agent: meta-externalagent

Disallow: /

2 comments

Symbiote 6 days ago

I have observed the same from Meta's crawler.

  User-agent: *
  Disallow: /

on e.g. our preproduction site, Meta is the only big-tech crawler that accesses it, at least with an honest user agent. (Meta also accesses disallowed paths on the production site.)

link

reconnecting 6 days ago

I'm not defending meta here, but I should mention that meta also uses crawlers to visit pages when someone send a link through their services.

   User-agent: *

can be ignored by bots, but if they ignore the disallow rule for their own UA, they can easily be blocked by network AS.

link

kev009 6 days ago

They don't obey *, they don't get their own entry. I'd rather just poison their data, it's a well known behavior from them.

https://www.reddit.com/r/webdev/comments/1sdzd1q/metas_ai_cr...

link