| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nomad_horse 306 days ago
	Whisper has the encoder-decoder architecture, so it's hard to run streaming efficiently, though whisper-streaming is a thing. https://kyutai.org/next/stt is natively streaming STT.

1 comments

woodson 305 days ago

There are many streaming ASR models based on CTC or RNNT. Look for example at sherpa (https://github.com/k2-fsa/sherpa-onnx), which can run streaming ASR, VAD, diarization, and many more.

link