Hacker News new | ask | show | jobs
Analyzing unknown binary files using information entropy (yurichev.com)
32 points by egorst 4054 days ago
3 comments

There is some technique called "normalized compression distance" that does sort of it. It uses compression to compare how similar some data is to some another.

For a similar problem, you can work like it was answered here: http://reverseengineering.stackexchange.com/questions/2897/t...

I always thought this idea could be greatly expanded upon.

I've seen it used to guess the native language of a text file based on the compressed input. I always believed this could be used as a sort of universal translator. You could compress the audio sounds of birds, throw this algorithm at it, and extract meaningful content.

To tell the truth, it already has been proposed and there are some papers about it. Search for "NCD - Normalized compression distance" and you'll get some results. It's even used to verify who is the author of some document (authorship attribution). Very interesting.
Cantor.Dust - the future was here, but turned out to be vaporware :(

https://www.youtube.com/watch?v=4bM3Gut1hIk