| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by v7n 640 days ago

I gave this a shot using speech-to-speech¹ modified so that it skips the LLM/AI assistant part and just repeats back what it thinks I said and displays the text.

For longer sentences my perception is that Moonshine performs at 80-90% of what Whisper² could do, while using considerably less resources. When trying shorter, two-word utterances it nosedived for some reason.

These numbers don't mean much, but when paired with MeloTTS, Moonshine and Whisper² ate up 1.2 and 2.5 GB of my GPU's memory, respectively.

¹ https://github.com/huggingface/speech-to-speech ² distil-whisper/distil-large-v3