Don't worry, it was just an amusing and honestly really impractical idea :)
The idea would be to make an educated guess at where each word occurs in the video - going off the time and subtitle data from pysrt - and build a dict linking words to when they occur in the video. You could then use MoviePy and stitch together a video version of the generated dialogue, by looking up the appropriate clip for each word.
ahh that does make sense now, and think it could be very feasible with a much more complex and sort of blended NN since the .SRT file's do have the time for each subtitle phrase
(i.e.
7
00:00:23,060 --> 00:00:24,619
give a turnaround version
)
but i am not sure the best way to go about doing something like this.
The idea would be to make an educated guess at where each word occurs in the video - going off the time and subtitle data from pysrt - and build a dict linking words to when they occur in the video. You could then use MoviePy and stitch together a video version of the generated dialogue, by looking up the appropriate clip for each word.