|
|
|
|
|
by v7n
593 days ago
|
|
I gave this a shot using speech-to-speech¹ modified so that it skips the LLM/AI assistant part and just repeats back what it thinks I said and displays the text. For longer sentences my perception is that Moonshine performs at 80-90% of what Whisper² could do, while using considerably less resources. When trying shorter, two-word utterances it nosedived for some reason. These numbers don't mean much, but when paired with MeloTTS, Moonshine and Whisper² ate up 1.2 and 2.5 GB of my GPU's memory, respectively. ¹ https://github.com/huggingface/speech-to-speech
² distil-whisper/distil-large-v3 |
|