Hacker News new | ask | show | jobs
by sp332 4094 days ago
I think evidence goes the other way. You can ask web crawlers not to index certain pages with robots.txt even if it would be better for their business if they did. And this is widely respected. Now imagine that IIS put "* deny" in the default site config; it would get a lot less respect.
2 comments

This example actually highlights an interesting difference between the two.

I think one of the reasons robots.txt is generally respected is that there's a stick behind that carrot; hypothetically (what with us all using so much cloud these days), a site administrator that noticed a traffic spike commensurate with something ignoring robots.txt can respond by treating the requests as attacker-originated, which most "legitimate" sites would want to avoid.

What's the stick behind the carrot for do not track?

You can block cookies or even block ad networks.
Yeah, sure, your average end-user is totally going to do that.

That's the difference between the two scenarios. A sysadmin will know what to look for and will know how to appropriately react to it. Your average end user probably doesn't know, care, or know how to react to it. And a built-in browser implementation will never happen because all the major companies have it in their best interests not to implement such a feature. If that weren't the case, we'd have had that feature long ago.

I doubt an average sysadmin would ever notice, let alone knowing what to do about it, let alone putting in the time and effort to do it.

Anyway I'm not sure I put the responsibility on the right group. It will probably be down to websites choosing ad networks that respect their users' DNT settings. Just like they choose ad networks that don't host malicious ads or ads that take over the whole page.

That's a lot more probable, yeah. I doubt anything were to happen on the end user side to enforce this.
adBlock has millions of users. If DNT decreases the adoption or even slows the adoption of adBlocks that's already a big win for adNetworks. You don't necessarily have to track a user to display ads, it's just possible to do more effective advertising if you do so.
> I think evidence goes the other way. You can ask web crawlers not to index certain pages with robots.txt even if it would be better for their business if they did.

I disagree. robots.txt is almost always used for hiding pages that shouldn't be exposed to the internet and are useless to expose. For example you don't need a robots.txt crawling the html document you're statically serving to prove domain ownership for Google Apps.

Everyone wants the most views on their content as possible so the incentive is to let as many things as possible be indexed therefore using robots.txt is limited as much as possible. It would not be good for search engines to crawl the things put into robots.txt.