There is some technique called "normalized compression distance" that does sort of it. It uses compression to compare how similar some data is to some another.
I always thought this idea could be greatly expanded upon.
I've seen it used to guess the native language of a text file based on the compressed input. I always believed this could be used as a sort of universal translator. You could compress the audio sounds of birds, throw this algorithm at it, and extract meaningful content.
To tell the truth, it already has been proposed and there are some papers about it. Search for "NCD - Normalized compression distance" and you'll get some results. It's even used to verify who is the author of some document (authorship attribution). Very interesting.
For a similar problem, you can work like it was answered here: http://reverseengineering.stackexchange.com/questions/2897/t...