Hacker News new | ask | show | jobs
by inciampati 1174 days ago
In most real world settings, at least in my personal use, latency to a remote AI comprises most of the usability difficulty with automated speech recognition. The larger whisper models can be run directly on a laptop using multi threading and achieve speech to text transcription that is fully sufficient to almost completely write whole emails, papers, documents with them. In fact, I've written most of this comment using an ASR system on my phone that uses whisper. While the smaller models (like the one user here) can need some correction, the bigger ones are almost perfect. They are both very sufficient and for realtime interactive use I see no future market for paid APIs.

Yesterday I wrote virtually all the prose in the manuscript while walking around with a friend and discussing it. We didn't even look at the phone.

Obviously there's an academic element here because I'm saying I'm using it for writing. But it's more of a human-centric computing thing. I'm replacing a lot of time that my thumbs are spent tapping on keys, my fingers are spent tapping on keyboard, and my eyes are spent staring at the words that are appearing, looking for typographical errors to correct, with time organizing my thoughts in a coherent way that can be spoken and read easily. I'm basically using whisper to create a new way to write that's more fluid, direct, and flows exactly as my speech does. I've tried this for years with all of the various ASR models on all the phones I've had and never been satisfied in the same way.

1 comments

Sounds great! Which app are you using for this?
"Openai Whisper Keyboard" is good. It doesn't use whisper.cpp but rather a pytorch implementation that runs on Android.

I also use whisper.el on emacs. It's amazing. Much more powerful but computer based of course.

Thanks!