|
|
|
|
|
by arbol
1309 days ago
|
|
The pi isn't really fast enough to process the speech in real time. deepspeech by mozilla was cited as an offline alternative to the Google speech API but it's difficult to set up with Mycroft and doesn't work very well (lack of data and lag - https://mycroft.ai/voice-mycroft-ai/). Because of this, Mozilla set up Common Voice (https://commonvoice.mozilla.org/en) to help build open datasets of voice recordings. |
|
If you've got an iPhone... put it in to airplane mode so that it is local only. You'll note that Siri no longer works when you do this. However... open up the notes app and tap the microphone. Do some interesting text...
> Mister Smith said that he wanted a two by four and half of a pie.
(if you don't have an iDevice, it transcribes this as:
> Mr. Smith said he wanted a 2 x 4 and 1/2 of a pie
That is without a network and done in real time. We can compare the relative processing capabilities of an iPhone and the RPi, but offline speech to text is feasible on a device of limited capabilities.
Additionally, you can do a limited vocabulary speech to text on chip ( https://www.imagesco.com/articles/hm2007/SpeechRecognitionTu... - https://www.amazon.com/HM2007-Speech-Recognition-Integrated-... ). This can handle the specific incantation common tasks (think closer to how a car voice control works - say exactly these words in this order), but that can help with performance for things that are often done.