Generally yes when it produces sane output at all, but while YT can get stuff comically wrong I've never seen it just go off the rails and start hallucinating and mindlessly repeating itself, which Whisper sometimes does especially if you're also trying to get it to translate something. Like Whisper will sometimes output a stream of things like "Please subscribe to my channel and follow me on Twitter!" or "Thank you for watching.".
On one source I tried the other day, the first 90 seconds or so is just generic opening music, no speech, but it "transcribes" it as "This is the end of the video. Thank you for watching. Please subscribe to the channel if you like. See you in the next video. Thank you for watching. Please subscribe to the channel if you like. Thank you for watching. ..." If you help it along by cutting up the source into only spoken segments you can get it to do better but just throwing it at a directory of material is probably going to leave you with some disappointment.
Then sometimes it does something surprising, on a j-pop song after hallucinating a bit during the intro it spit out a translation in the form you might find on a lyrics site, that is each line was "japanese-characters romaji-version english-translation". I haven't been able to get it to do it again (even for the same source).
Yeah, it can help a bit with looping, but introduces other problems. I recalled from earlier that a combo of tweaking no_speech_threshold and logprob_threshold settings helped somewhat, though trying again on a random video it doesn't do much. Still hallucinates a stream of captions (albeit non-repetitive, though one run had several Touhou related lines) for what should be 4 minutes of looping background music before the first sentence. If all one needs Whisper for is transcribing English though, I still think it's pretty decent. On my test video now it will 'correctly' transcribe the music as ♪ when I ask it to just transcribe it as English.
I'd assume whisper will be better than YT auto ones for sure, especially if you choose the right model