whisperx also adds improved timestamping, closed captioning output, and beta diarization (speaker labeling) support. unfortunately it doesn't seem to support m4a out of the box but you can convert to mp3 (upgrade the sound lib dependency first) or wav with ffmpeg.
whisper.cpp is not universally very very slow. With an M1 Macbook and the medium model it's faster than real time. There may be some accuracy lost because it uses a different search method and if you choose to run a smaller model.