You can also use software to detect “cuts” in the video, which can be used to improve the frame-extraction over just getting six evenly spaced frames from the video.
This is a task called "video summarization". See https://paperswithcode.com/task/video-summarization . I guess the whole project is something like summarizing from video + subtitles + text to pictures + text.
I used something like this a few years ago in a project sort of similar to this one. There's a bunch of parsing and processing to do with that, and the "0.3" value is ... fiddly, but it worked pretty well:
For this project, I want to find an A.I. solution for finding the most 'interesting' frames. Not even sure how to measure interestingness yet, might be the presence of text, the presence of a human ...