| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by superice 3723 days ago

At my company we did it reasonably well, we just released our product to the market. We do have the advantage of having an app-based model, which means that every app the user installs on our product improves the speech recognition and the amount of actions installed.

Getting started is not that hard, getting good is. It's a hard problem to parse speech correctly, take numbers for example: Nineteen-Eightyfour can be parsed as 19 and 84, or as 1948, or as 9 10 80 4. There are challenges, certainly, but creating a simple program with things like wit.ai is do-able. Prepare to write a lot of speech parsing logic, and implement every piece of functionality by hand. Magic as Siri, Google Now, and Cortana may seem, most of it is just hardcoded responses and actions. That need not be a problem, but I can promise you, smart assistants will lose a lot of their magic once your realize it's just a bunch of responses and actions hard coded.

Anyway, I don't want to discourage you from trying, because it's really interesting to try and see what challenges you're going to encounter. The getting started pack for speech recognition is mostly wit.ai, or the Google STT engine. Keep in mind: none of the big companies are doing everything from scratch. Sure, Google has their own speech recognition, but recognizing the trigger phrase (jargon for 'OK Google', or 'Hey Siri') is outsourced. Every piece of software that has such a trigger uses the same library. Remember: using libraries is not cheating, it's just focusing on your core task, which is writing the parsing and actions.