| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by java_beyb 912 days ago

well, deepgram might be the fastest among cloud-dependent APIs, like Speechmatics and Assembly AI mentioned above. -but- it cannot be faster than local or smaller models as you mentioned.

Among local solutions, Whisper SDK doesn't support streaming, I haven't seen any good workarounds or successfully implemented it. VOSK, DeepSpeech, Kaldi, et al were good once upon a time... Picovoice seems to be doing well.

I was planning to work on this: https://picovoice.ai/blog/chatgpt-ai-virtual-assistant-in-py... using Eleven Labs and Cheetah. Hope I can crave some time

1 comments

jkachmar 912 days ago

unless i'm misunderstanding `whisper.cpp` seems to support streaming & the repository includes a native example[0] and a WASM example[1] with a demo site[2].

[0]: https://github.com/ggerganov/whisper.cpp/tree/master/example...

[1]: https://github.com/ggerganov/whisper.cpp/blob/master/example...

[2]: https://whisper.ggerganov.com/stream/

java_beyb 911 days ago

have you tried it? i mean for fun, it wouldn't hurt for sure and ggerganov is doing amazing stuff. kudos to him.

but whisper is designed to process audio files in 30-second batches if I'm not mistaken. it's been a while since whisper released, lol. These workarounds make the window smaller but it doesn't change the fact that they're workarounds. you can adjust, modify, or manipulate the model. You can't write or train it from scratch. check out the issues referring to the real-time transcription in the repo.

can you use it? yes would it perform better than Deepgram? -although it's an API and probably not the best API- I am not sure. would i use it in my money-generating application? absolutely not.