|
|
|
|
|
by endisneigh
1710 days ago
|
|
If you wanted to do something like "OK Google" with AssemblyAI would you have to transcribe everything and then process the substring "OK Google" on the application layer (and therefore incur all of the cost of listening constantly)? It'd be cool if there was the ability to train a phrase locally on your own premises and then use that to begin the real transcription. This probably wouldn't be super difficult to build, but was wondering if it was available (didn't see anything at a glance) |
|
There are some open source libraries that make this relatively easy:
- https://github.com/Kitt-AI/snowboy (looks to be shutdown now) - https://github.com/cmusphinx/pocketsphinx
This avoids having to stream audio 24x7 to a cloud model which would be super expensive. This being said, I'm pretty sure what the Alexa does, for example, is send any positive wake word to a cloud model (that is bigger and more accurate) to verify the prediction of the local wake word detection model AFAIK.
Once you are positive you have a positive wake word detected - that's when you start streaming to an accurate cloud based transcription model like Assembly to minimize costs!