Hacker News new | ask | show | jobs
by blackkettle 1058 days ago
Comparison by competitor but it’s believable IMO. Basically about the same performance as whisper:

- https://deepgram.com/learn/nova-speech-to-text-whisper-api

Not surprising though as at this level all these options are starting to be leveled by inconsistencies in manual groundtruth. Conformer alone also isn’t the most powerful architecture out there for speech. This is also slower than, say running a large k2 zipformer via onnx on cpu.

Also if you have a small shop at this point you can do all of this yourself with whisper large v2 on a single 16gb gpu via some tweaking of https://github.com/guillaumekln/faster-whisper and an OSS LLM.

Interesting stuff but I think margins in this space are getting ready to simply vanish.

1 comments

Deepgram will correlate the text in your transcription with the timestamp where that was uttered. This is really really impressive and useful.