Wouldn't it be "easy" to capture a video of a minute or so and then write an algorithm to keep the parts that were unchanged among all frames? I did something like that ten years or so ago, and it worked very well (it took the mode of each pixel among X frames).
In many cases that will do, but there will be problem with continuously changing objects, such as ads, escalators and clocks that will be blurred, at least unless a quite sophisticated algorithm is used.