Hacker News new | ask | show | jobs
by trowngon 1716 days ago
I believe most people already moved to offline engines. No need to send the data to some random guys like this Assembly. Nemo Conformer from Nvidia, Robust Wav2Vec from Facebook, Vosk. There are dozen options. And the cost is $0.01 per hour, not $0.89 per hour like here.

Another advantage is that you can do more custom things - add words to vocabulary, detect speakers with biometric features, detect emotions.

1 comments

without talking about accuracy any comparison is meaningless.
You don't even need to compare accuracy, you can just check the technology. Facebook model is trained on 256 GPU cards and you can fine-tune it to your domain in a day or two. The release was 2 month ago. There is no way any cloud startup can have something better in production given they have access to just 4 Titan cards.