An algorithm that processes private user data is by itself not invading anyone's privacy. It's clear to me that invasion of privacy only happens when humans look at private user data directly, or look at user data that's not sufficiently processed by an algorithm.
Otherwise, something as simple as a spell checker would be an invasion of privacy because it literally looks at every word in an email you write. That's absurd.
At least in my opinion, there's a big difference with where the data lives and where the checking algorithm is run. I don't think a spell checker would fall into what I'd consider a privacy concern as long as the spell checker is running locally on my device.
I don't work in the area of email nor Google but I see two problems.
1) you need to constantly update the spell checker so each time you say this is word or something like that most likely the data is send the problem is part of the data, I assume Google do something similar whit data send to span and mark as not spam. This is full email redirect and analysis not partial like old word processing.
2)I feel ai make this even more harder so now you can't simply check patterns as simply as before, and you need to check the whole content constantly
We've had spell/grammar checkers in word processors that worked totally offline for a long time now. They definitely can be improved with a hosted service but that's by no means necessary and comes with tradeoffs like latency and offline support.
If an algorithm is looking through private stuff and making a decision based on it or is sending signals where the signal depends on the private stuff, then it's pretty much by definition leaking private information.
An algorithm that leaked no private information would not be useful to a business. It would do a bunch of computation and then throw it away. So realistically anything that looks at private information is privacy-relevant.
You can have debates about how much private information should be leaked and for what purposes. But I don't think having a threshold like "it's all private unless another human reads it" is a good way to think about the issue.
Pre-AI we had a system that watched user patterns and would identify possibly suspect patterns that were outside of the norm. We also had system that would content-id the images and attachments to see what was going uploaded in a general way. Given enough suspicion then the account would be opened to look for abusive patterns.
There is absolutely no promise on any cloud hosted services that a human will not ever see your data. However, at Google it was made very, very, VERY clear that if we had to scan somebody's personal email for any reason then discussion of the contents outside of legally mandated, or required for work ways would lead to immediate termination and possible lawsuit for any damages to reputation incurred.
While fixing user accounts, or dealing with delivery of content I saw epic piles of personal email. Besides the ones full of CASM or other abusive material I couldn't say that I ever remembered the contents 30 minutes later. Its like a checker at a grocery store. They don't care about whatever embarrassing tings your buying and won't remember you 10 minutes later. =)
You might be thinking of the General David Patraeus case, a national security leak that was slightly worse than Snowden's, but with little repurcussions :)
yep.. And it would split uploads across dozens of accounts with parity so that if any account was disabled it could re-create the data from what was in the other accounts. (think bittorrent using imap uploaded content in gmail)
Otherwise, something as simple as a spell checker would be an invasion of privacy because it literally looks at every word in an email you write. That's absurd.