|
Checking online, this [1] appears to be one of the most heavily referenced on StackOverflow for downloading both user entered and automatically generated transcripts. (Python based) [1] https://github.com/jdepoix/youtube-transcript-api Notably, Google really needs to have an obvious API endpoint for this kind of call. If 1000's of programmers are all rolling their own implementation, there's probably a huge number that constantly download the full video and transcribe in data harvesting. Kind of surprised honestly it's taken this long for Youtube to fall prey to massive data harvesting campaigns. From this article [2] and this paper on Youtube data statistics [3] there are ~14,000,000,000 videos on Youtube with a mean length of 615 seconds (~10 minutes). You'd think people would be interested in: 8,610,000,000,000 seconds
143,500,000,000 minutes
2,391,666,666 hours
3,274,083 months
272,840 years
27,284 decades
2,728 centuries
273 millennia
Of live action video on nearly every single subject in human existence.Also, the paper's really cool and extremely sobering about being a "content creator" based on the 1% get all views. [2] "What We Discovered on ‘Deep YouTube’", https://www.theatlantic.com/technology/archive/2024/01/how-m... [3] "Dialing for Videos: A Random Sample of YouTube", https://journalqd.org/article/view/4066/3766 |
But, I think they are probably culturally opposed to publicly exposing this sort of thing, even if it only works via authenticated account. Also worth considering that doing so would make it easier for a competitor to steal the value they provide with the generated closed captions.