I'd be curious to know if/how the Clojure code could be written to do this kind of number crunching on GPU. Penumbra has a very idiomatic library for offloading work to the GPU.
I don't have time for a full conversion just now, but an eight-way diffusion on the GPU in Penumbra looks like this:
(let [sum 0.0
count 0.0]
(convolution 1
(+= sum %1)
(+= count 1))
(/ sum count))
"convolution" is a keyword that iterates over the neighbors (with a radius of 1, in this case), and does not overrun the boundaries of the source textures, hence the need to keep a running count.
For a more extensive example, see this implementation of Sobel edge detection on the GPU: http://github.com/ztellman/penumbra/blob/master/src/example/...