|
|
|
|
|
by busup
918 days ago
|
|
It's beam size 1. From my quick tests on a Colab T4, CTranslate2 (faster-whisper's backend) is about 30% faster with like for like settings. I decoded the audio, got mel features, split into 30s segments, and ran it batched (beam size 1, batch size 24, no temperature fallback passes). Takes a bit more effort than a cli utility but isn't too hard. Side note, the insanely fast whisper readme gives benchmarks on an A100 but only the FA2 lines were. The rest were on a T4 looking at the notebooks/history. Turing doesn't support FA2 so the gap should be smaller with it, but based on the distil-whisper paper CTranslate2 is probably still faster. TensorRT-LLM might be faster but I haven't looked into it yet. |
|
It's enabled by default with the latest Transformers version, so just make sure you have:
* torch>=2.1.1
* transformers>=4.36.0