Hacker News new | ask | show | jobs
by ashraymalhotra 2876 days ago
Just curious, when online, I would expect Google Speech to text models to outperform most of the offline only models. I also believe for text to speech their wavenet models are one of the best available for developers?

https://cloud.google.com/speech-to-text/docs/streaming-recog...

1 comments

If you're looking into developing something, if I were you, I would look at three things.

1.) Form factor - it's amazing how much of this industry relies on either hearing aids or holding a mic/phone in someone's face.

2.) Quality of voice - I'm lucky. I am not "deaf", I just have trouble picking out voices at certain frequencies, or all voices if there's enough noise. Luckily, I don't have to speak with an electronic voice. If I did, I'd want a voice that sounds a little human and has some inflection. This is a tough problem, but if you can solve it, the Kurzweils of the world would be playing catch up.

3.) Targeting - I damaged my hearing when I was young, addicted to very very loud music (especially live) and not particularly bright. I'm in my forties and still not particularly comfortable using assistive technology. If I were a teen, I'd be fucking mortified to use any of these. Fuck, I could have easily failed high school because of that. I can't escape the feeling that there's a market for people like me. I would love an audiologist's office with tattoos and Bad Religion. If that feeling came to an assistive device, I'd be a customer.

Of course, I'm a weirdo and it's not always the best idea to start a company to cater to a weirdo!! :)