Might be interesting to experiment with different mappings of pixel location to audio sample number, rather than just having a row by row linear scan from corner to corner.
I am curious. I do not know much about image encoding and the such. What other types are there besides linear either row by row or column by column? It seems to me that those two are the only logical ways to map pixels.
Any permutation could be used, but I suppose you'd want to use ones that form some sort of visually recognisable pattern. For example, a spiral emerging from the centre of the image, or all the even numbered pixels from a linear scan followed by all those indexed by an odd number.
Right now I've mostly been exploring YUV and RGB colorspace in either packed or planar formats.