| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by atty 919 days ago
	I think this is using the OpenAI Whisper repo? If they want a real comparison, they should be comparing MLX to faster-whisper or insanely-fast-whisper on the 4090. Faster whisper runs sequentially, insanely fast whisper batches the audio in 30 second intervals. We use whisper in production and this is our findings: We use faster whisper because we find the quality is better when you include the previous segment text. Just for comparison, we find that faster whisper is generally 4-5x faster than OpenAI/whisper, and insanely-fast-whisper can be another 3-4x faster than faster whisper.

3 comments

moffkalast 919 days ago

Is insanely-fast-whisper fast enough to actually run on the CPU and still trascribe in realtime? I see that none of these are running quantized models, it's still fp16. Seems like there's more speed left to be found.

Edit: I see it doesn't yet support CPU inference, should be interesting once it's added.

link

atty 919 days ago

Insanely fast whisper is mainly taking advantage of a GPU’s parallelization capabilities by increasing the batch size from 1 to N. I doubt it would meaningfully improve CPU performance unless you’re finding that running whisper sequentially is leaving a lot of your CPU cores idle/underutilized. It may be more complicated if you have a matrix co-processor available, I’m really not sure.

link

youssefabdelm 919 days ago

Does insanely-fast-whisper use beam size of 5 or 1? And what is the speed comparison when set to 5?

Ideally it also exposes that parameter to the user.

Speed comparisons seem moot when quality is sacrificed for me, I'm working with very poor audio quality so transcription quality matters.

link

busup 918 days ago

It's beam size 1. From my quick tests on a Colab T4, CTranslate2 (faster-whisper's backend) is about 30% faster with like for like settings. I decoded the audio, got mel features, split into 30s segments, and ran it batched (beam size 1, batch size 24, no temperature fallback passes). Takes a bit more effort than a cli utility but isn't too hard.

Side note, the insanely fast whisper readme gives benchmarks on an A100 but only the FA2 lines were. The rest were on a T4 looking at the notebooks/history. Turing doesn't support FA2 so the gap should be smaller with it, but based on the distil-whisper paper CTranslate2 is probably still faster.

TensorRT-LLM might be faster but I haven't looked into it yet.

link

sanchit-gandhi 918 days ago

Hugging Face Whisper (the backend to insanely-fast-whisper) now supports PyTorch SDPA attention with PyTorch>=2.1.1

It's enabled by default with the latest Transformers version, so just make sure you have:

* torch>=2.1.1

* transformers>=4.36.0

link

busup 918 days ago

Nice, thanks for your work on everything Whisper related. I tested it a couple weeks ago which largely matched the results in the insanely fast whisper notebook. Comparison was with BetterTransformers.

I just reran the notebook with 4.36.1 (minus the to_bettertransformer line) but it was slower (the batch size 24 section took 8 vs 5 min). Is there something I need to change? Going back to 4.35.2 gives the old numbers so the T4 instance seems fine.

link

atty 919 days ago

Our comparisons were a little while ago so I apologize I can’t remember if we used BS 1 or 5 - whichever we picked, we were consistent across models.

Insanely fast whisper (god I hate the name) is really a CLI around Transformers’ whisper pipeline, so you can just use that and use any of the settings Transformers exposes, which includes beam size.

We also deal with very poor audio, which is one of the reasons we went with faster whisper. However, we have identified failure modes in faster whisper that are only present because of the conditioning on the previous segment, so everything is really a trade off.

link

sanchit-gandhi 918 days ago

Indeed, insanely-fast-whisper supports beam-search with a small code modification to this code snippet: https://huggingface.co/openai/whisper-large-v3

Just call the pipeline with:

result = pipe(sample, generate_kwargs={"num_beams": 5})

link

PH95VuimJjqBqy 919 days ago

yeah well, I find that super-duper-insanely-fast-whisper is 3-4x faster than insanely-fast-whisper.

link

atty 919 days ago

Yes I am not a fan of the naming either :)

link