The simplest explanation is often the most probable one.
Why would you reach for a cluster of machines working in parallel, when you could retrieve the already auto-created transcript from YouTube servers?
Also, other comments have pointed out that the transcripts are identical with the ones created by YouTube, which would be unlikely to happen if this service was creating transcripts of their own.