Imagine the waveform of a sound wave - there is an amplitude for each point in time, which you can easily take a FT of. This waveform could also be, say, a vibrating guitar string at a fixed moment in time, in which case the units are amplitude vs. distance, but it is still just a function that you can take a FT transform of (now the "frequency" domain is the inverse of distance rather than the inverse of time, but it still works). An image is just a function of amplitude vs. distance, though there are 2 distance variables since an image is 2-dimensional, but the idea is the same, and the Fourier Transform is still defined for functions of 2 or more variables. In a color jpeg, I believe this is done individually for red, blue, and green, and the point at which you cut off the infinite series of Fourier coefficients determines image quality.
That said, this picture is awesome! I've never seen it done before.
Another idea to grok: a checkerboard pattern image where every other pixel row and column are alternating full black and full white represents the "highest frequency" image possible at that sample rate (pixel density). This is the 2d equivalent of an alternating sinusoid +1,-1,+1,-1, etc
I learnt fourier transforms as something done to 2D images before I learnt any other applications, so I can draw the FT of a simple image by hand, but I really struggle getting my head around other applications :(
That said, this picture is awesome! I've never seen it done before.