Hacker News new | ask | show | jobs
by _iebm 3008 days ago
Seems all the methods in the writeup are APIs (not sure about wit or sphinx), so what's missing is missing locally-run processes like DeepSpeech. But on that same note, I'd like to see greater accuracy comparisons on all these methods, and pricing (googly gets to around $1.44 / recorded hour?) since that's a significant factor.

From prior use, Google's speech API (at least the "video" model) is freakishly accurate compared to DeepSpeech to where I wondered if they used closed captioning to help train their model. But I haven't seen rest of these at work: https://i.imgur.com/cdOlARO.png