Hacker News new | ask | show | jobs
by floatingatoll 2530 days ago
We found at one job that approximate one quarter of well-known search engines blatantly use robots.txt noindex declarations as a list of URLs to index, and one openly mocked us for asking them to stop.

Voluntary honor systems don’t work, because there’s no way to compel non-compliers to stop other than standard “anti-attacker arms race” approaches, such as the obstacle described at the head of this thread.

1 comments

It sounds like scraping is a big problem for you guys. What kind of outfit is it, if you don't mind me asking?
Drop me an email and I’m happy to describe further.