| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by busup 918 days ago

It's beam size 1. From my quick tests on a Colab T4, CTranslate2 (faster-whisper's backend) is about 30% faster with like for like settings. I decoded the audio, got mel features, split into 30s segments, and ran it batched (beam size 1, batch size 24, no temperature fallback passes). Takes a bit more effort than a cli utility but isn't too hard.

Side note, the insanely fast whisper readme gives benchmarks on an A100 but only the FA2 lines were. The rest were on a T4 looking at the notebooks/history. Turing doesn't support FA2 so the gap should be smaller with it, but based on the distil-whisper paper CTranslate2 is probably still faster.

TensorRT-LLM might be faster but I haven't looked into it yet.

1 comments

sanchit-gandhi 918 days ago

Hugging Face Whisper (the backend to insanely-fast-whisper) now supports PyTorch SDPA attention with PyTorch>=2.1.1

It's enabled by default with the latest Transformers version, so just make sure you have:

* torch>=2.1.1

* transformers>=4.36.0

link

busup 918 days ago

Nice, thanks for your work on everything Whisper related. I tested it a couple weeks ago which largely matched the results in the insanely fast whisper notebook. Comparison was with BetterTransformers.

I just reran the notebook with 4.36.1 (minus the to_bettertransformer line) but it was slower (the batch size 24 section took 8 vs 5 min). Is there something I need to change? Going back to 4.35.2 gives the old numbers so the T4 instance seems fine.

link