Hacker News new | ask | show | jobs
by JosephLark 3180 days ago
Edit up top: The identification is done on-device. The Verge article didn't mention this.

> And this year’s Pixel will take advantage of the phone’s always-on microphones to listen for music (not just the phrase “OK Google”) and display what you’re listening to on the screen, even if it’s something on the radio.

This sounds creepy. So now when excessive microphone data is seen to be going out to the cloud, they can just say "Oh, the phone thought there was music playing and was trying to identify it. Simple misunderstanding, nothing nefarious!".

3 comments

Except it doesn't send any data to the cloud. They mentioned that explicitly. There's an on-phone database of songs that it is matching against.
I wonder how much space this takes on the device. How many signatures does it keep and how often is this updated
It's not very significant. Basically it stores some distances between peaks in each song's spectrogram which can then be further compressed.

Even supporting a database of millions of songs would be possible.

If each signature occupies an effective space of 1k (not sure how feasible this is), then for 1000 songs this would take 1M, 10M for 10k songs, etc

Every year gives us around 100 popular songs (add a % of location-specific popular ones), so it seems the plan is feasible.

Spotify has 30 million songs.

Even if it takes up a small amount of space, it's basically a non-feature.

>Spotify has 30 million songs.

Many of those songs have never been played[1]. There's a really (really, really, really) long tail.

[1]: http://forgotify.com/

The index can be highly compressed, see for example

https://blog.afterthedeadline.com/2010/01/29/how-i-trie-to-m...

> They mentioned that explicitly

Okay? The submitted article certainly didn't mention that.

The Anadtech live blog does indeed state "01:05PM EDT - On device machine learning. Local music identificat (sic)"

Other articles did mention that this will be local.

I have a feeling they are using federated machine learning for this to have a lot of the processing done locally and not need to activate the radios for as much of the processing as possible. They have been making big strides in that area lately and this might be the start of some of the major applications of it (I think they are using it in their keyboard prediction as well from a bit ago)

so there's going to be a huge database of songs on my mobile!!! i don't think anybody asked for it... sounds like trickery... this is going to take up some storage space now...

anyone know how many gb or mb this occupies in my phone... can v jus clear this data...

The phone is just storing song fingerprints, probably no more than a few megs for every song ever recorded. It would be great to extract the data and release it for public use.
This was my thought as well. Hopefully someone extracts this database for anyone to hack on.
I doubt it, that would be a good chunk of data for what I see as a fairly small feature.

More likely (this is a guess, nobody outside google really knows at this point), they will use federated machine learning to figure out that something is "a song", then perhaps clean up and isolate the actual "song" part of it and send that over to a google server for processing.

But again it was just announced, so nobody really knows how this works, where the data is or goes, and what tradeoffs were made.

They announced, in the talk, that this is all done locally on the phone. The phone will have a database of, iirc, 10,000 songs locally.

[edit] typos

Ah! Then I'm sure they have some way of making the data needed much smaller than I thought it could be!
They only need to store (and occasionally update) the kernel parameters for the trained deep neural net. Very small indeed.
Lots of devices already do this. My Echo occasionally misinterprets its wakeword and broadcasts up little 4-second clips of whatever is going on at the time it decides to do that. If you're worried about so-called "accidental" identification that allows them to activate listeners and receive the sound data from the room, that's already a pervasive threat.

Reminder that US intel exploited a bug (or a "bug") in Samsung Smart TVs that allowed them to surreptitiously activate the built-in microphones and stream the room's sound on-demand, obviously with no notification to the user. [0]

That gets me curious, did anyone try running that malware and see which servers it transmitted up to? Would be interesting to go through logs and see, retroactively, who this was used on in the wild. Would be even _more_ interesting if it proxied through a tunnel at a cooperative BigCo...

[0] https://wikileaks.org/ciav7p1/cms/page_12353643.html