Hacker News new | ask | show | jobs
by jamesonthecrow 2486 days ago
Obviously the big cloud players offer their own APIs and SDKs (for a price), but there are a few other solutions worth looking at.

Facebook has open sourced some pre-trained models: https://github.com/facebookresearch/wav2letter

Picovoice has some smaller, more efficient models capable of running on edge devices: https://github.com/Picovoice

Full ASR does require quite large models and datasets, but you don't need nearly that much power or data to fine-tune a model for your own domain.

2 comments

Wasn't aware of Picovoice. Just tried the live do they have on their website... wow, it's... not great! Even if I spoke as precisely as possible or/and put on an American accent, it was way off the mark.
Ah, thanks, I wasn't aware Picovoice had released an actual engine yet.