| Speech recognition has mostly gotten over the accent hump, and with LLMs you can bridge the gap. For fun I tried an experiment, in one context I told GPT to mangle sentences in a way that a speech-to-text might, and in another I told GPT to transform sentences into commands for a car's various ECUs. Results: "Torn the hair conditioning gown to 65 decrees." => works "Can yew plays make it warmer in hear" => works "It's weigh two, uh how due yew say, it's way to frayed in hear!" => works (!) It actually understood a mis-transcribed French word in the middle of a series of English commands: > I apologize for the confusion. I interpreted "frayed" as an incorrect transcription of "froid" (which means "cold" in French) considering the context of the sentence and the surrounding phrase "how do you say." Thus, I replaced "frayed" with "cold."" It even realized that someone inserting filler like "how do you say" indicated they may be looking for a word in another language: > The phrase "how do you say" is often used by someone who is trying to recall a word or phrase in another language, or when they are unsure about the correct term to use in the current language. In this context, it signaled that there might be a language-related issue, leading me to consider that "frayed" might be an incorrect transcription of a word in another language, such as "froid" for "cold" in French. - And
And in case you think it just guessed on past commands, I was able to replicate this in a fresh context window with no hints about what commands it should accept. Voice is really about to stop sucking for the first time in the history of tech: It can go from "I'm tired of this shit man" to knowing it should change the current song. |