Hacker News new | ask | show | jobs
by FractalNerve 3176 days ago
Is all voice in a room with a person wearing Google Buds transmitted to the cloud??

Isn't that prohibited and eavesdropping onto unsuspecting citizens?

I mean it's not just temporarily processed and immediately deleted afterwards. That's Google trying to get WiFi SSIDs, (Satellite, Street, Indoor, Face, Object) Pictures, Voices, real-time Geo-Location of consumers and whatnot.. it's getting off the hand..

I was shocked when I first discovered that my "ok, Google" conversations were stored in the cloud indefinitely with a badly designed and hidden Web-UI that allows "deletion" (whatever that legally means with Google-speak).

4 comments

I know that this is a point that is raised a lot of times, but if we think about it ---

isn't cloud connectivity almost a necessity for things to work as well as they do?

Particularly, Google's game is machine learning... it cannot do it if it doesn't have the data that it does (e.g. Google Now won't be able to connect the dots between my buying a hotel room and my buying plane tickets... conveniently having location directions to hotel just when I get done at the airport, etc. etc.).

Consider that languages are an always-evolving entity. I'm barely 30 and I have seen idioms and sayings fall and rise within the last decade. Certainly it is true that the best translation of some 20-word sentence would be different if it is done 10 or maybe even 5 years apart.

And of course it is a matter of fact that we all talk about the same things... e.g. some shooting occurred at some place recently, that means people will be talking about that thing and looking it up. It means we might say a name of a place or a hotel that we haven't before said before... the translating service will be better off if it has some awareness of it and then has a bias for voice transcription with cultural awareness, it'll work better that way. For Google to do its best job when it comes to translation, it needs to be connected to all of us, it has to be with us in the Cloud.

It’s absolutely possible to do machine learning based tasks without sending all data to an ad network that builds a profile of your entire life to serve you ads.

Apple does a lot of on-device work, and anonymises what does need to talk to their backend for (eg map directions).

The problem is, google can’t adopt that pattern even if they want to (which they don’t) because their whole push for years has been about very thin (software wise) clients talking to a largely web based set of google services.

The question is not does it need cloud, the question is, is it worth it. When you send an audio stream to Google, you are imposing your answer to that question on everybody else in the room.
Or let any third party [view a photo | talk to you on the phone | listen to a voice memo ] with everybody else in the room. We already have answers to those "worth it" questions, largely.

The more important question is whether a third party can use that data to cause you harm, and if that's a concern, I'm afraid the only real solution is to limit your ability to be included in that data.

Not necessarily… all life works without a cloud doesn’t it? You can certainly realize your example with a completely local AI. You may need the cloud for training (at the moment) but not for day to day execution!
> Is all voice in a room with a person wearing Google Buds transmitted to the cloud??

No, of course not. Ignoring for a moment that Translate works perfectly fine offline already this would only ever be transmitting anything after you hit the button. It's not passively transmitting everything it hears to the cloud. Even if you ignore the privacy implications of that it'd drain the battery crazy fast.

Are you sure?

From: https://www.theverge.com/circuitbreaker/2017/10/4/16408962/n...

"And this year’s Pixel will take advantage of the phone’s always-on microphones to listen for music (not just the phrase “OK Google”) and display what you’re listening to on the screen, even if it’s something on the radio."

Pretty sure it can't do that without at least transmitting audio fingerprints to the cloud passively...

Idea: a device that sniffs wifi and bluetooth MAC addresses and warns you when there's a Google Pixel2 in earshot...

(Note: That's the Pixel2, not the earbuds, but for the "magic" to work, the earbud wearers will have one of those in their pockets...)

The feature you linked is also handled on-device.
Really? I'd love to know how they do that? Can you _really_ cram a useable sized db of popular music fingerprints into something small enough to store locally on a phone these days?
I work for google but have no clue what underlying tech is being used here. But I have some familiarity with audio fingerprinting, so I thought I'd comment.

Using a not particularly efficient but reliable fingerprinting algorithm such as http://www.ismir2002.ismir.net/proceedings/02-FP04-2.pdf, you need 2.6kbits/sec for the fingerprint. Popular songs tend to be short, so lets say 3:30 per song on average. That works out to about 70K per song, or 70MB per 1000 songs. But compression and/or other sorts of encoding cleverness could massively reduce this number. Bottom line is that it wouldn't be hard to store thousands of fingerprints given a few hundred megabytes of storage (out of 64GB or more).

You don't need then entire song for a fingerprint, and the MusicBrainz (MetaBrainz) project has long been supported by Google.

I couldn't find anything which said how big their db is, but you could do some artist/song popularity smarts to figure out the tracks a user is most likely to listen to.

But to constantly be searching through fingerprints for every sound? I'd be interested to see what Googles solution to this may be.

There were 75K albums (not just songs) released in 2010 alone. The total number of songs is in hundreds of millions.

No way you can fit their fingerprints and the metadata into 64GB, compressed or not.

Shazam has 40M fingerprints in their DB, and they definitely don't have everything. Their software runs on beefy servers.

Yes.
I'm a bit late, but I watched the announcement for this feature and they explicitly said that nothing was sent to that cloud to determine what song it was.

I remember this stood out to me, because that was a large concern of mine when I first heard about the feature. I respected that they went out of their way to make this point.

She said that it is based on machine learning, and something like a database of "audio samples". it uses the samples to figure it out, I guess.

> the earbud wearers will have one of those in their pockets

Use a recording app start recording the mic on your phone, put it in your pocket, and then talk at normal conversation volumes. It's not gonna pick up much, to say nothing of a person on the other side of the room.

You don't know this at all, and should not be speculating.
> Isn't that prohibited and eavesdropping onto unsuspecting citizens?

I would assume the privacy expectations of public/private spaces used for photography also apply here. If you happen to be in the frame when I take a photo, am I prohibited from uploading the image to Facebook or Google Photos?

Depends on whom you take a photo off, where and from where. It's a complicated matter juristically, but the morale is easy. Don't take a photograph of someone visible in your photo without their consent. If so, at least don't upload it anywhere public or online.

In Germany there's a law giving you the copyright on your own photo[1], it's difficult in the USA and less difficult in other countries[2]

EDIT:

Voice recordings are practically illegal in the public without consent in Germany[3] and also in the USA [4][5].

--

[1] https://de.wikipedia.org/wiki/Recht_am_eigenen_Bild_(Deutsch...

[2] https://en.wikipedia.org/wiki/Photography_and_the_law

[3] https://dejure.org/gesetze/StGB/201.html

[4] https://en.wikipedia.org/wiki/Legality_of_recording_by_civil...

[5] https://www.rcfp.org/browse-media-law-resources/digital-jour...

From [2]: "Should the subjects not attempt to conceal their private affairs, their actions immediately become public to a photographer using normal photographic equipment."
not sure photography matches audio recording bit for photography here's a good reference

https://commons.m.wikimedia.org/wiki/Special:MyLanguage/Comm...

Well, I think the obvious way to solve this is simply to make it obvious the translation is occuring with a third party (who may do anything with the data). I think this is still fantastic technology even if it is limited to public conversations.