I see a potential problem with moving pictures - the neural network might decide that while red was an appropriate color for the truck in the last frame, blue is more likely in this frame.
You're right, it would be a fun side effect to see the color change between different scenes :)
I guess there's not enough in commercial interest in fixing these problems, but it can be probably done with the current algorithms. Precise 3D reconstruction is much more important than colour reconstruction.
Would it be possible to have some sort of semantic naming coupled to it 'the car in this scene is red', that could be fixed by humans, maybe even by talking to the NN? That's when this stuff starts getting fun IMO - human language together with superhuman domain knowledge.
Maybe it's about a truck painting shop, and every time they paint something, it either doesn't change color, or get a color different from what the customer ordered. I would watch that.
Actually, someone recently filmed exactly that - a 10-hour film of paint drying, as a protest against film censorship. The British Board of Film Classification awarded it a "U" rating: universal/suitable for all.
Are there any other issues or advantages that you're able to see that would be unique to movies? While I'm unable to think of a use, seems like there's the potiental for side channel analyis via music, dialog, background-noises, scripts, etc.
Script analysis would take much more effort probably, but it could work if the colour is in the movie script...in that case object detection gets important as well. Just using more training data to cover all the dog breeds and car colours would help a bit though.
How would they know that the vehicles, say, were supposed to be the same colour. I'm thinking The Italian Job - the same model of cars are used and are distinguished by their colours. Also what about when different vehicles are painted by the production team to look the same (for stunts say), the computer could rightly recognise them as different vehicles - how would it then know that they're supposed to be the same.
For such things it seems you'd need to check every shot change.
BW filmmakers were, of course, well aware that they were filming in BW and would select colors for sets and costumes that would look good in BW. For example, chocolate milk was used for blood. You wouldn't want a colorizer to determine the actual colors, but the intended colors!
Even today, directors rarely seem to want to film in actual color. They'll tint everything sepia, or that hideous blue-orange scheme that is so popular these days.
That TIJ cars were distinguishable only by color was made possible by filming in color. A BW movie would not make a film requiring distinguishing colors that filmed as equal shades of grey.
In any case, any such colorizing system would be designed to accept a bit of guidance here and there from the artist. This is much like when an OCR'd document needs a bit of touch-up.
And even if it wasn't perfect, many BW movies would be made much more watchable, like the 1927 Wings, which is crying out to be colorized (and have a soundtrack added).
One idea here is to adopt a recent generative approach: a CNN which starts with two noise image inputs, and then repeatedly tweaks it plus a new noise image over multiple inputs until it does one last tweak to a final version. The noise serves as a RNG for making choices to the built-up image, I think. You could apply this recurrent idea to movies too: for the first BW frame, pass in a noise image and the BW frame, get out a C frame; now for the second BW frame, pass in the BW frame but also the C frame from before. The CNN may gradually learn to transfer colors from the C frame to the BW frame, thereby maintaining temporal coherency.
(Or you could just try to use a RNN directly and keep hidden state from frame to frame.)