| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bastawhiz 1253 days ago
	While plausible on paper, it's not practical unless they jam an order of magnitude or two more compute into the devices. To get reasonable accuracy (i.e., enough to be able to use for profit) from any casual speech, the current models run far from realtime on a modern MacBook. You're not going to squeeze reasonable accuracy from the tiny processor on the devices in the world today, even if you record and process async as a way to hide from people inspecting traffic. Edit: it's worth noting that this dramatically increases the cost of the device. They'd need to be able to see a way to recoup those costs if they eat the additional hardware cost. But that's silly for a company that's literally in the business of cloud computing and where the goal of the hardware is to hide what you're doing. When will people start asking why there's a full GPU in their Echo?

1 comments

gruez 1253 days ago

> While plausible on paper, it's not practical unless they jam an order of magnitude or two more compute into the devices. To get reasonable accuracy (i.e., enough to be able to use for profit) from any casual speech, the current models run far from realtime on a modern MacBook. You're not going to squeeze reasonable accuracy from the tiny processor on the devices in the world today, even if you record and process async as a way to hide from people inspecting traffic.

Do you really need 100% accuracy here? This isn't like cops setting up a wiretap. Google isn't waiting for you to slip up and admit that you like funko pops or whatever. If you're constantly talking about your cat, or wanting to get a car, that's all they need to target ads to you.

Also, the processing doesn't have to be real time. It doesn't matter that google learns about your cat 8 hours late because the device is running its ML models in the background while you're asleep. If the device picks up 3 hours of speech per day, it only needs to process at 1/8x speed to catch up. On the off chance you have a house party and it's picking up 6 hours of speech, it can always buffer it for later, or drop it altogether (see above paragraph about how it doesn't need to pick up everything).

bastawhiz 1253 days ago

It kind of does matter, actually. Lots of English (and other words!) sound the same. A cat lover who starts getting ads for baseball bats and fat loss pills isn't going to convert. Context matters, too, not just matching words. If I start talking about "my dear father" and get ads for tractors and hunting gear, I'm not going to convert.

Advertisers aren't going to pay for random spoken keywords anyway. They're going to pay to target people by demographic and interest. Things _about_ you, not things you're talking about. Just because I mentioned tampons doesn't mean I'll ever buy a box of tampons (I simply lack the anatomy). And if you start building a profile about somebody based on poorly-overheard bits of speech, you're building a castle on bad foundations. The data is bunk.

Just having a TV or radio on near the device will have suddenly poisoned the data.

> If the device picks up 3 hours of speech per day, it only needs to process at 1/8x speed to catch up.

The Echo currently has a 32-bit processor that is designed to be pretty minimal. OpenAI Whisper tiny runs at about 2/3 speed. That's with a 6-core ~2.3ghz laptop processor. The CPU in the Echo runs 0.6-1ghz, and the system is not designed for general purpose computing. I don't have the ability to benchmark it, but you're not going to get close to 1/8 with the Echo hardware.