Beautiful work as usual. The entropy visualisations in particular stuck out for me, and I can imagine how useful it is to be able to see this kind of information at a glance while dissecting malware.
Cheers. :) I'm trying very, very hard to resist the urge to start writing a binary dissection tool based on this. I'm picturing a hex viewer with a space-filling curve navigation pane, with options to switch between different pixel layouts, and color maps. Then again, the last thing I need is another side-project.
Do most binaries have this high ratio of encrypted/obfuscated content? That is, would a checker which simply looked at high-entropy fraction of a binary be able to detect malware not in its database? Obviously this is trivial to defeat, but it might be the case that a broad defense such as this would force malware authors to let their code grow much bigger, which might in turn lead to other generic signatures. Would it be worthwhile?
Unfortunately perfectly legitimate compressed sections are both very entropic and very common, so just an entropy measurement won't be very useful for malware detection. I think there might be some promise in looking at patterns of entropy, though. What you can't see in that malware post is that very many pieces of malware show very similar layouts of entropy, even if the malware itself is very different. I weeded out similar images so that people could get an idea of the range of layouts, but the 50 visualizations on the blog are a "condensed" version of about 400 highly redundant images. Part of this is because different types of malware can still use the same "packer" - tools that pack a binary to obfuscate malware and make it hard to detect and reverse engineer. Part of it is just because there are certain techniques that are commonly used. All of this requires more study, but it's interesting.