Analyzing unknown binary files using information entropy

Y	Hacker News new \| ask \| show \| jobs

	Analyzing unknown binary files using information entropy (yurichev.com)
	32 points by egorst 4054 days ago

3 comments

woliveirajr 4054 days ago

There is some technique called "normalized compression distance" that does sort of it. It uses compression to compare how similar some data is to some another.

For a similar problem, you can work like it was answered here: http://reverseengineering.stackexchange.com/questions/2897/t...

link

snarfy 4054 days ago

I always thought this idea could be greatly expanded upon.

I've seen it used to guess the native language of a text file based on the compressed input. I always believed this could be used as a sort of universal translator. You could compress the audio sounds of birds, throw this algorithm at it, and extract meaningful content.

link

woliveirajr 4054 days ago

To tell the truth, it already has been proposed and there are some papers about it. Search for "NCD - Normalized compression distance" and you'll get some results. It's even used to verify who is the author of some document (authorship attribution). Very interesting.

link

rasz_pl 4054 days ago

Cantor.Dust - the future was here, but turned out to be vaporware :(

https://www.youtube.com/watch?v=4bM3Gut1hIk

link