Hacker News new | ask | show | jobs
by redm 2979 days ago
Just some of the ways Google collects data [1]:

Google Search, Google Fiber, Google Recaptcha, Google Translate, Google Adsense, Google Chrome (Safe Browsing checks etc.), Google DNS (8.8.4.4 etc.), Google Mail, Android Integrations, Google GSuite (Sheets, Docs, etc.), Google Drive, Google Analytics, Google AMP, YouTube

So on and so forth. Of course, Google keeps most of this data to themselves to improve products and sell ads, but it's scary how much they have especially since they broke down the firewall between services. [2]

[1] https://en.wikipedia.org/wiki/List_of_Google_products

[2] https://googleblog.blogspot.com/2012/01/updating-our-privacy...

6 comments

The data collected for Google Public DNS is actually very reasonable. Have a look at https://developers.google.com/speed/public-dns/privacy.

I haven’t checked the other products, just wanted to point this out since people seem to assume worse than what’s actually happening (for Google Public DNS, at least).

It's not just what they collect with a single service. It's the amount of clarity that comes from a linkage with their other collected data sets.
On that same web page:

> We don't correlate or combine information from our temporary or permanent logs with any personal information that you have provided Google for other services.

Of course, all of that is meaningless if you want to think that Google is lying in its privacy policies.

(Disclaimer: I work at Google, but not on 8888).

Personally, if they cant prove it, it shouldn't be trusted. And in talking about proving it with the source code. The days of granting benefit of the doubt to these companies should be over. The onus of security should be on the providers proving it, not misplaced trust.
How does providing the source code help with proving trust? You don't know that the service is actually running the source code that was published.

If you're willing to believe that the service matches the published source code, why wouldn't you also be willing to believe that the service matches the published "specifications" (e.g. privacy policy)?

Do any providers you use meet this standard? A few bits of open source software have been subjected to credible public audits, but not most of those. For proprietary services, the detailed results of any audits are typically nonpublic.

This is nice in theory but would prevent you from sharing your data with any third-party services in practice beyond your personal circle of trusted acquaintances.

To be fair, that's also what they said for the other services before they changed their minds.
We don't correlate or combine information from our temporary or permanent logs with any personal information

I'd say that depends on what definition of "personal" they're using.

https://policies.google.com/privacy/key-terms#toc-terms-pers...

"Personal information

This is information which you provide to us which personally identifies you, such as your name, email address or billing information, or other data which can be reasonably linked to such information by Google, such as information we associate with your Google account."

OK, but that just brings up "other data."
Plus all the data gathered from their discontinued Tools. Also even if you're not using gmail/Android/Gsuite, if you're working/communicating with someone who does, Google still get your information.
A more egregious way Google collects data on internet users is through their hosted libraries service. You visit some completely unaffiliated site which happens to use jquery or some other library and instead of hosting it themselves they have a script tag with a src=ajax.googleapis.com/...
For that, see https://developers.google.com/speed/libraries/terms

That stuff is kept separate from all account data (like with Google DNS[0] and fonts[1], too): no common cookies, "unauthenticated" (ie. no cross-referencing with Google accounts), logs retain no referrers

[0] https://developers.google.com/speed/public-dns/privacy [1] https://developers.google.com/fonts/faq#what_does_using_the_...

I don't read it as stating that the data is kept entirely separate. In fact it references the general privacy policy making it quite clear that whatever data is collected is governed by the same rules as everything else.
What data is collected?
What's the meaning of the data being kept separate when these databases could be linked with minimal effort (e.g. via IP addresses), for examples at the request of law enforcement (US or other).
They don't use this data currently. It's a hedge against the day when firefox and safari start including something like uBlock by default.
The solution to this is here:

https://decentraleyes.org

Pretty cool looking service. Since I've never heard of it, I wish the down voters would explain the problem with it (at the time of writing it's a dark gray, so only a little negative).

Is it simply that people don't seem library CDNs as a source of privacy piercing data?

They generally aren't, as far as I can tell?

SRI (no changing content for a specific user) + crossorigin ('The "anonymous" keyword means that there will be no exchange of user credentials via cookies, client-side SSL certificates or HTTP authentication'), no referrers via meta tag or header.

The other end gets your IP and browser UA, with nothing else. It is pretty low on the totem pole of worry.

I guess the problem is that one-liners that just drop a link without explanation are suspected to be spam and they often are.
That certainly wasn't spam, but unfortunately I was in the middle of something else at the time and didn't have time to post the explanation I probably should have included.
> Google Chrome (Safe Browsing checks etc.)

Probably more critical here is Chrome Sync. They can and do read your entire browsing history through that.

With Safe Browsing, they at least still promise (in a legally binding way) that they don't store the data.

> Google Drive

With the small drop of faith I have left in Google, I want to believe they don't read my files and use encryption. Is there any evidence to the contrary?

They use encryption in transit and at rest, but not in between.

So, your data is uploaded over TLS or similar, gets decrypted on the server and then is re-encrypted before it's stored on hard drives.

So yeah, this does mean that they have access to your data. Since the at-rest-encryption happens on the server, Google has the encryption key for that somewhere and can at any point decrypt your data.

Presumably not everyone at Google gets your data for reading at home, but that's about as much comfort as you should assume.

The NSA, CIA, FBI can also request Google to decrypt your data and hand it over. They could not do the same, if Google used proper end-to-end-encryption.

There is one point to be made for not using E2EE, which is that you can't offer a "Forgot Password?"-link. If the user forgets their password, you can't decrypt their data either. All you can do is wipe their data and let them start anew.

If you use your cloud only for syncing, that's probably not a problem (for example Firefox Sync does exactly that on a scale of millions), but if you use it as a backup or to preserve hard drive space, it can certainly be.

So, you'll have to decide for yourself, if you think being allowed to forget your password is worth the surveillance and lowered security.

If not, use a different service. Spideroak, SeaFile and Mega.nz are a few that do E2EE.

If you do think so, at least use a service that's not at home in a surveillance state and surveillance company...

Why not just link the article directly? [1] That KIA subreddit is notorious for being scummy [2]

1. https://motherboard.vice.com/en_us/article/9kgwnp/porn-on-go...

2. https://www.polygon.com/2017/11/2/16591508/reddit-content-po...

Was in the midst of doing some work and it was one of the first results I found on Google, I figured some of the commentary might provide more context, not fully familiar with that sub though.
And the Google Fonts CDN.
Correction: I see now that no cookies are sent with Google Fonts CDN requests:

https://developers.google.com/fonts/faq#what_does_using_the_...

You don't need Cookies. Your browser requests it from Google's server, which gives them your IP address. And your browser sends what webpage you're currently on as part of the HTTP referrer.

So, if every webpage you visit loads in its fonts from Google's CDN, they have your complete browsing history from that alone.

Other commonly used CDNs of Google: GStatic, JQuery, ajax.googleapis.com