Hacker News new | ask | show | jobs
by jasonvorhe 1830 days ago
They've been respecting robots.txt and tracking opt-outs for years, right? Just one whistleblower and it's over. Why risk it? Also: Afaik it's opt-in after it leaves Origin Trial phase [1].

[1] https://twitter.com/Log3overLog2/status/1384337637763387394?...

4 comments

> They've been respecting robots.txt and tracking opt-outs for years, right?

Sort of. Kind of.

googlebot only respects part of robots.txt, the part that refers specifically to itself. It doesn't respect global robots.txt rules.

Google also explicitly don't really respect the disallow rules:

> However, robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant. If you wish to explicitly block a page from being indexed, you should instead use the noindex robots meta tag or X-Robots-Tag HTTP header. In this case, you should not disallow the page in robots.txt, because the page must be crawled in order for the tag to be seen and obeyed. [0]

[0] https://developers.google.com/search/docs/advanced/robots/ro...

Googlebot also doesn't respect crawl delays in robots.txt.
So they respect “do not track” headers?
No, but almost everyone ignored it and it never matured out of Candidate Recommendation:

> Efforts to standardize Do Not Track by the W3C in the Tracking Preference Expression (DNT) Working Group reached only the Candidate Recommendation stage and ended in September 2018 due to insufficient deployment and support. [...] Despite supporting it in its Chrome web browser, Google did not implement support for DNT on its websites, and directed users to its online privacy settings and opt-outs for interest-based advertising instead. The Digital Advertising Alliance, Council of Better Business Bureaus and the Direct Marketing Association does not require its members to honor DNT signals.

Source: https://en.wikipedia.org/wiki/Do_Not_Track

Not that long ago there was a story about the google analytics opt out addon at https://tools.google.com/dlpage/gaoptout not doing anything.
>They've been respecting robots.txt

sorry, wasn't meaning to imply Googs ignores robots.txt. I was going for conceptually it is easy to ignore it, just as it is easy, conceptually, to ignore HTTP headers.

>and tracking opt-outs for years, right?

is this provable? if i opt-out with my g-account in the browser on a desktop, that should imply i want out of all tracking, yet you have to do it on each app on each platform. it's wack-a-mole that is impossible to win.