Hacker News new | ask | show | jobs
by ddod 4476 days ago
Since the Wit.ai guys gave a shout out on my thread earlier today, I thought I'd return the favor and show off my own implementation of a voice>>text>>comprehension system that I used to make my personal site voice interactive: https://benwasser.com

I'm really glad there's work and advancement being done in this arena and I'm hoping to see more people playing around with it.

1 comments

Wow, ddod your voice>>text is awesome. Did you implement it all on your own?

Some improvement text>>comprehension will be great. Right now it does not understand many of my queries. Keep it up the good work!

Voice to text is a multi-million if not billion dollar endeavor. I simply implemented the Web Speech API, which ostensibly uses Google (and possibly Apple's) voice recognition system. I came up with the text comprehension bit, which is limited to the input quality (what I get from the API) and the training set. I've been adjusting and adding to the training set since I first released this, but the matching has worked as well as I'd hope for.

My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.

Here's some examples of the input it gets and has no clue what to do with:

- " um changes nice so when you change the things december " matched to: -1: no match

- " nexus 10 " matched to: 13: Huh?

- " pictures " matched to: 14: Huh?

- " call " matched to: 17: Cool

- "videos " matched to: 16: Huh?

- " change " matched to: 15: Huh?

And this is a set of input where people did get it:

- " hi what's your name " matched to: 23: My name

- "what's your name " matched to: 27: My name

- "what do you do " matched to: 31: What I do

- "why should we hire you " matched to: 38: Why you should hire me

-"what's your favorite food " matched to: 35: Food

-"wendy's see yourself in 5 years " matched to: 28: Goals

> My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.

I did not either, as soon as I start to ask about general questions about you (e.g.'what's your email address'), the result got better. Now I know your answer is predefined to your personal info, I know why other questions won't work well.

Perhaps many people, like me, only read the div starting with "Let's chat", but then get started immediately (because the red recording button caught my attention right away) and totally ignored the div "I'll try to answer here" with your intent written.

If you're interested in conversational voice & intent recognition, check out voicebox.com