Voice to text is a multi-million if not billion dollar endeavor. I simply implemented the Web Speech API, which ostensibly uses Google (and possibly Apple's) voice recognition system. I came up with the text comprehension bit, which is limited to the input quality (what I get from the API) and the training set. I've been adjusting and adding to the training set since I first released this, but the matching has worked as well as I'd hope for.
My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.
Here's some examples of the input it gets and has no clue what to do with:
- " um changes nice so when you change the things december " matched to: -1: no match
- " nexus 10 " matched to: 13: Huh?
- " pictures " matched to: 14: Huh?
- " call " matched to: 17: Cool
- "videos " matched to: 16: Huh?
- " change " matched to: 15: Huh?
And this is a set of input where people did get it:
- " hi what's your name " matched to: 23: My name
- "what's your name " matched to: 27: My name
- "what do you do " matched to: 31: What I do
- "why should we hire you " matched to: 38: Why you should hire me
-"what's your favorite food " matched to: 35: Food
-"wendy's see yourself in 5 years " matched to: 28: Goals
> My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.
I did not either, as soon as I start to ask about general questions about you (e.g.'what's your email address'), the result got better. Now I know your answer is predefined to your personal info, I know why other questions won't work well.
Perhaps many people, like me, only read the div starting with "Let's chat", but then get started immediately (because the red recording button caught my attention right away) and totally ignored the div "I'll try to answer here" with your intent written.
My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.
Here's some examples of the input it gets and has no clue what to do with:
- " um changes nice so when you change the things december " matched to: -1: no match
- " nexus 10 " matched to: 13: Huh?
- " pictures " matched to: 14: Huh?
- " call " matched to: 17: Cool
- "videos " matched to: 16: Huh?
- " change " matched to: 15: Huh?
And this is a set of input where people did get it:
- " hi what's your name " matched to: 23: My name
- "what's your name " matched to: 27: My name
- "what do you do " matched to: 31: What I do
- "why should we hire you " matched to: 38: Why you should hire me
-"what's your favorite food " matched to: 35: Food
-"wendy's see yourself in 5 years " matched to: 28: Goals