|
|
|
|
|
by yujonglee
304 days ago
|
|
I use VAD to chunk audio. Whisper and Moonshine both works in a chunk, but for moonshine: > Moonshine's compute requirements scale with the length of input audio. This means that shorter input audio is processed faster, unlike existing Whisper models that process everything as 30-second chunks. To give you an idea of the benefits: Moonshine processes 10-second audio segments 5x faster than Whisper while maintaining the same (or better!) WER. Also for kyutai, we can input continuous audio in and get continuous text out. - https://github.com/moonshine-ai/moonshine
- https://docs.hyprnote.com/owhisper/configuration/providers/k... |
|
The short duration effectively means that the transcription will start producing nonsense as soon as a sentence is cut up in the middle.