True, but the attention layers still need to be able to look at all the shots - for example to make sure the background of a room shown at the start of the movie is the same as the background of the same room at the end.
Obviously you could do 'human assisted' movie making where humans decide the storyboard and make directions for each shot, and then that isn't necessary.
Obviously you could do 'human assisted' movie making where humans decide the storyboard and make directions for each shot, and then that isn't necessary.