Y
Hacker News
new
|
ask
|
show
|
jobs
by
syntaxing
151 days ago
Is there something similar for STT? I’m using whisper distill models and they work ok. Sometimes it gets what I say completely wrong.
2 comments
daemonologist
151 days ago
Parakeet is not really more accurate than Whisper, but it's much faster - faster than realtime even on CPU:
https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
. You have to use Nemo though, or mess around with third-party conversions. (Also has a big brother Canary:
https://huggingface.co/nvidia/canary-1b-v2
. There's also the confusingly named/positioned Nemotron speech:
https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...
)
link
jokethrowaway
151 days ago
Parakeet feels much more accurate in practice than whisper, it was a real "a-ha" moment for me.
Of course, English only
link
satvikpendem
151 days ago
Keep in mind Parakeet is pretty limited in the number of languages it supports compared to Whisper.
link
phoronixrly
151 days ago
from the other day
https://github.com/cjpais/Handy
link