| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CSSer 1202 days ago

Is there any chance you could expose a pathway to use a local instance of Whisper? I ask primarily because OpenAI completely open-sourced Whisper in September 2022[0]. It seems odd to me to default to or encourage the usage of a paid service for something that appears to be available for free under MIT license including models[1].

My understanding is that the only reason OpenAI even setup the paid API is because it "can also be hard to run [sic]". Personally, I'm skeptical. I"m not knocking them for it but I could see how this is just brand capitalization.

[0]: https://openai.com/blog/introducing-chatgpt-and-whisper-apis...

[1]: https://github.com/openai/whisper

4 comments

nonoesp 1202 days ago

If you use the large-v2 model they expose via the API, the more accurate, in your local machine, you'll see that even though it works great it's slow and won't work for long audio files because of memory limitations.

It's fairly easy and quick to run Whisper for free either locally in an Anaconda environment with Python or the command-line interface or, even better, in a Google Colab notebook.

Here's a sample notebook that builds on a notebook by Pete Warden.

https://colab.research.google.com/drive/1sxsey3n0jd09MjUd9Ky...

link

rolisz 1202 days ago

On a 1080Ti (so a 6 year old GPU), the large model runs in 1x time (so transcribing 10 minutes takes 10 minutes) and I've successfully transcribed even 1h+ files.

link

kkielhofner 1201 days ago

FWIF an optimized implementation I've been working on comes in at roughly 70x realtime (large-v2, beam size 5) on an RTX 3090.

link

rolisz 1201 days ago

Nice! Are you going to release it publicly?

link

kkielhofner 1200 days ago

Great question!

We're still very early stage and stealth so it's not quite clear to us where our lines are with regards to special sauce/significant competitive advantage.

As the CTO (and lead dev) I'd lean towards open sourcing it (because it's awesome and we're standing on the shoulders of open source giants already) but it may become clear it's too differentiating to open source. As I said it's just too early to tell.

What I can say is if we open source it HN will be the first to hear about it!

link

paxys 1202 days ago

> My understanding is that the only reason OpenAI even setup the paid API is because it "can also be hard to run [sic]". Personally, I'm skeptical. I"m not knocking them for it but I could see how this is just brand capitalization.

Why is it hard to see that not every organization has the capability to set up their own translation cluster, provision GPUs, frontends, scaling, on-call rotations, regularly update models..? It's not just "brand capitalization". An API that you can call to transcribe/translate a recording with zero extra work is absolutely essential to have for most.

link

cnbeining 1201 days ago

I have a pipeline setup in https://github.com/cnbeining/Whisper_Notebook/blob/master/Wh... .

- Run Voice Activity Detection for better timestamp output - Transcribe with Whisper - Run Forced Alignment to get per word timestamp - Create better segmented SRT - Translate(with multiple APIs - implemented DeepL, Google Translate, Baidu and a couple more)

link

Tenoke 1202 days ago

The API is useful because not everyone has quick 10+gb vram gpus lying around.

link

CSSer 1201 days ago

You know, this is true. I was a bit too dismissive about it because I haven't done a lot of deploying models myself. I was making the assumption that it was similar to many other services, but even looking at pricing for managed GPUs on most instances shows me that's clearly not the case.

link