Last time I tried whisper, it hallucinated an elaborate conversation from sounds of slapping and moaning and it took minutes to spit every single line of it.
If I remember correctly, the whisper documentation actually recommends to trim non-speech portions as the models halucinate heavily during those portions.