Hacker News new | ask | show | jobs
by CoolestBeans 1148 days ago
Voice control is a bit of a nonstarter as the speech recognition (perhaps combined with the weird acoustics of a moving vehicle) has never worked for everyone; eg my father's accent never seems to work in his Honda. My car is from 2014 and definitely has more buttons than necessary. I don't mind infrequently used controls behind a touchscreen, like do the controls to set the time on the clock really need dedicated buttons? But the issue as I see it is that whats frequently used for some is infrequently used for others. For example, I never take my car's drive mode out of eco, but my sister switches to sport when she goes up big hills. Once you get out of the category of obviously frequently controlled things (gear selector, climate control, media controls), you really have to start making judgement calls and picking and choosing who you're going to make uncomfortable. For example, do you prioritize a radio selector? Some people never use the AM/FM radio on their car. Some people only listen to AM/FM radio and never connect their smartphone.
3 comments

Speech recognition has mostly gotten over the accent hump, and with LLMs you can bridge the gap.

For fun I tried an experiment, in one context I told GPT to mangle sentences in a way that a speech-to-text might, and in another I told GPT to transform sentences into commands for a car's various ECUs.

Results:

"Torn the hair conditioning gown to 65 decrees." => works

"Can yew plays make it warmer in hear" => works

"It's weigh two, uh how due yew say, it's way to frayed in hear!" => works (!)

It actually understood a mis-transcribed French word in the middle of a series of English commands:

> I apologize for the confusion. I interpreted "frayed" as an incorrect transcription of "froid" (which means "cold" in French) considering the context of the sentence and the surrounding phrase "how do you say." Thus, I replaced "frayed" with "cold.""

It even realized that someone inserting filler like "how do you say" indicated they may be looking for a word in another language:

> The phrase "how do you say" is often used by someone who is trying to recall a word or phrase in another language, or when they are unsure about the correct term to use in the current language. In this context, it signaled that there might be a language-related issue, leading me to consider that "frayed" might be an incorrect transcription of a word in another language, such as "froid" for "cold" in French.

-

And And in case you think it just guessed on past commands, I was able to replicate this in a fresh context window with no hints about what commands it should accept.

Voice is really about to stop sucking for the first time in the history of tech: It can go from "I'm tired of this shit man" to knowing it should change the current song.

100% agree. While they can’t do partial differential equations, they are very good at discerning intent in even very noisy language. Taking a free form language instruction and encoding in a structured form is specifically a powerful capability here. I have a feeling voice assistants are about to become extraordinarily powerful.
> Once you get out of the category of obviously frequently controlled things (gear selector, climate control, media controls), you really have to start making judgement calls and picking and choosing who you're going to make uncomfortable.

User-assignable buttons might mitigate that. With profiles for individual drivers.

But perhaps those features would confuse some users even more

To me this is always a really interesting line of thinking. We customize the heck out of our phones but the car is off limits!?

I want my doors to lock every night at 10pm in case I forget.

I want to bring up my favorite temperature and fan settings.

I want to lock just some doors.

I want the windows down just the right amount.

When do we get to take advantage that this is all software?

I never liked voice recognition until I had to test it myself at my job, and seen firsthand just how much better it has gotten over the past few years.

I think voice commands would be much more widely used if they hadn’t rolled it out so soon, ironically. But most of us learned early on that it sucks and never tried it again.