|
|
|
|
|
by T-hawk
4337 days ago
|
|
The data comes in faster than 60 fps. A camera sensor doesn't capture the entire frame instantly every 1/60 second. It progressively scans through the frame over some measurable fraction of that 1/60 second. This is that quirk. Suppose the camera scans 720 lines in HD every 1/60 second. Each row is offset in time by 1/43200 second. A rigid object could be slightly offset in space on each line of pixels, indicating that sound waves perturbed it in the time gap between when the camera captured each line. So that subframe video data can be turned back into audio at a much higher frequency than that apparent 60 Hz video sampling rate. In other words, we're not just talking about 60 frames-per-second from a camera. It's really perhaps 43,200 rows per second, an enormously higher sampling frequency. |
|
Yes, yes, that was completely obvious from the article. We are getting thousands of "measurements" per second.
However, each of those measurements is incredibly inaccurate. Each one is trying to detect the change of colour of 1/200 of the colour range in a single pixel. You may be getting less than a single bit of entropy per measurement.
An advanced signal processing technique will look at the longer-term picture. Sound vibrations are not a random walk - they tend to be a combination of sine wave vibrations, where the rate of change of magnitude of each wavelength is significantly lower than the vibrations themselves. Therefore they are to a certain extent predictable, and this predictability is used by audio compression algorithms. The signal processing algorithm will have to make use of the extremely limited information coming from the measurements, and match up possible sets of varying sine waves that could be causing those measurements. This may be sufficient to reject some of the noise that we could hear on that video, and clean up the sound a bit, but it is quite a hard (and CPU-intensive) processing task.