Hacker News new | ask | show | jobs
by mnw21cam 4336 days ago
There is some small possibility of improvement through software techniques, such as maybe data assimilation, which can use information from surrounding time-frames to improve the measurement. This is assuming that the magnitude of vibrations changes a lot slower than the vibrations themselves, which is usually true, and how most audio compression works. It may be able to clean up the sound a little. However, I would say that the results they have obtained so far are very impressive.
1 comments

The data comes in faster than 60 fps. A camera sensor doesn't capture the entire frame instantly every 1/60 second. It progressively scans through the frame over some measurable fraction of that 1/60 second. This is that quirk.

Suppose the camera scans 720 lines in HD every 1/60 second. Each row is offset in time by 1/43200 second. A rigid object could be slightly offset in space on each line of pixels, indicating that sound waves perturbed it in the time gap between when the camera captured each line. So that subframe video data can be turned back into audio at a much higher frequency than that apparent 60 Hz video sampling rate.

In other words, we're not just talking about 60 frames-per-second from a camera. It's really perhaps 43,200 rows per second, an enormously higher sampling frequency.

> The data comes in faster than 60 fps

Yes, yes, that was completely obvious from the article. We are getting thousands of "measurements" per second.

However, each of those measurements is incredibly inaccurate. Each one is trying to detect the change of colour of 1/200 of the colour range in a single pixel. You may be getting less than a single bit of entropy per measurement.

An advanced signal processing technique will look at the longer-term picture. Sound vibrations are not a random walk - they tend to be a combination of sine wave vibrations, where the rate of change of magnitude of each wavelength is significantly lower than the vibrations themselves. Therefore they are to a certain extent predictable, and this predictability is used by audio compression algorithms. The signal processing algorithm will have to make use of the extremely limited information coming from the measurements, and match up possible sets of varying sine waves that could be causing those measurements. This may be sufficient to reject some of the noise that we could hear on that video, and clean up the sound a bit, but it is quite a hard (and CPU-intensive) processing task.

Well the reader would read as fast as it can.

Let's say that it would read the entire image in 1/120 second, then it is waiting and does nothing another 1/120 second before it starts reading next frame.

The real number would be significantly smaller. Therefore they can not bump the sample rate more then five or six times. And I imagine they are using some intelligent algorithm to evenly space out the captured samples already.