Y
Hacker News
new
|
ask
|
show
|
jobs
by
nomad_horse
306 days ago
Whisper has the encoder-decoder architecture, so it's hard to run streaming efficiently, though whisper-streaming is a thing.
https://kyutai.org/next/stt
is natively streaming STT.
1 comments
woodson
305 days ago
There are many streaming ASR models based on CTC or RNNT. Look for example at sherpa (
https://github.com/k2-fsa/sherpa-onnx
), which can run streaming ASR, VAD, diarization, and many more.
link