Hacker News new | ask | show | jobs
by ipnon 1608 days ago
Speech models today can mine the entire corpus of published conversation and return the most likely response to a given statement. That's not how we converse. Every relationship you have is a little model in your brain that we call a person's "personality." Every one talks differently, has different frames of reference, uses different codes of language, different assumptions. Cutting edge speech models work perfectly for the perfectly average speaker, but that person does not exist! The farther we stray from the mean, the more alienating these speech models become.
2 comments

> Cutting edge speech models work perfectly for the perfectly average speaker, but that person does not exist!

This is a well known pit in statistics, I would think, given there are extremely famous stories about this exact issue causing deaths. In the 1950's, the air force was trying to figure out why their pilots were dying, and determined it was because their cockpit designs which used "average" pilots were a poor fit for almost ever real world pilot.[1]

1: https://www.thestar.com/news/insight/2016/01/16/when-us-air-...

I have massive issues with speech recognition software. It doesn't work for me in either english or spanish. Statements like "google and siri are so advanced now" feel like people are collectively pranking me.

That said, I too have wondered why we don't have speech control for computers or at least appliances.

You don't need to parse all language. Just a standard set of primitives like you'd find on a remote should be way easier to recognize and can even be selected for their ease of parsing. Simple things like on, off, next, back, louder, etc.

An interesting project: Automatically convert a terminal commands `--help` page to a speech model. Run that over $PATH, then you never have to type again!