| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BoorishBears 1148 days ago

Speech recognition has mostly gotten over the accent hump, and with LLMs you can bridge the gap.

For fun I tried an experiment, in one context I told GPT to mangle sentences in a way that a speech-to-text might, and in another I told GPT to transform sentences into commands for a car's various ECUs.

Results:

"Torn the hair conditioning gown to 65 decrees." => works

"Can yew plays make it warmer in hear" => works

"It's weigh two, uh how due yew say, it's way to frayed in hear!" => works (!)

It actually understood a mis-transcribed French word in the middle of a series of English commands:

> I apologize for the confusion. I interpreted "frayed" as an incorrect transcription of "froid" (which means "cold" in French) considering the context of the sentence and the surrounding phrase "how do you say." Thus, I replaced "frayed" with "cold.""

It even realized that someone inserting filler like "how do you say" indicated they may be looking for a word in another language:

> The phrase "how do you say" is often used by someone who is trying to recall a word or phrase in another language, or when they are unsure about the correct term to use in the current language. In this context, it signaled that there might be a language-related issue, leading me to consider that "frayed" might be an incorrect transcription of a word in another language, such as "froid" for "cold" in French.

And And in case you think it just guessed on past commands, I was able to replicate this in a fresh context window with no hints about what commands it should accept.

Voice is really about to stop sucking for the first time in the history of tech: It can go from "I'm tired of this shit man" to knowing it should change the current song.

1 comments

fnordpiglet 1148 days ago

100% agree. While they can’t do partial differential equations, they are very good at discerning intent in even very noisy language. Taking a free form language instruction and encoding in a structured form is specifically a powerful capability here. I have a feeling voice assistants are about to become extraordinarily powerful.

link