If I understood correctly, VAD has superior results than using ffmpeg silencedetect + silentremove, right?
I think latest version of ffmpeg could use whisper with VAD[1], but I still need to explore how with a simple PoC script
I'd love to know more about the post-processing prompt, my guess is that looks like an improved version of `semantic correction` prompt[2], but I may be wrong ¯\_(ツ)_/¯ .