I haven’t used faster-whisper so I can’t compare performance, but whisper.cpp does support cuda via CUBLAS, and it’s noticeably faster than the cpu version. I used it earlier this year to generate subtitles for 6 seasons of an old tv show I backed up from dvd that didn’t include subtitles on the disc.
Fwiw decent acceleration works on any avx2 compatible chipset. I get realtime speed for everything but the large models with a recent Ryzen system. The apple silicon is good but not as special as folks think!
We use it for our Willow Inference Server which has an API that can be used directly like OP project and supports all Whisper models, TTS, etc:
https://github.com/toverainc/willow-inference-server
The benchmarks are pretty incredible (largely thanks to ctranslate2).