Hacker News new | ask | show | jobs
by openasocket 3365 days ago
This is for obfuscating string constants, the foreign languages included is a red herring. The reason for this is that nontrivial code often has string constants in it, and the string contents are stored in the ELF/PE file in a manner that makes it trivial to extract. Since these strings often reveal a lot about the malware (e.g. a string constant "Your computer has been infected with randomware. Please deposit %d bitcoins to address %s") antivirus signatures often use them to detect specific kinds of malware, and reverse engineers find them useful in determining what a binary does. This framework scrambles the string contents (using techniques like XOR-ing every character against a random key), and injects some code into the executable so that the strings are unscrambled on startup. They just have foreign languages in the example to demonstrate this framework correctly handles unicode.

Analysts never use the language of the code comments for attribution, because such things are trivial to forge.

2 comments

Considering that debug symbols, comments in code and Cyrillic characters in the metadata of files is being used a solid evidence Russia hacked the DNC, I'd say that it's probably still a useful tool
Source? I've read the stuff Crowdstrike and Manidant have put out and they mentioned none of those as evidence. Just binary analysis and network indicators from what I've seen.
Thanks for this insight! I'll edit my comment to credit you, but I won't delete it since someone might have the same thought process as me.

My comment:

So I see now (thanks to you) that it is just showing test cases (test warbles) to demonstrate that these scrambling techniques work with foreign languages. However, why would the us gov need to make sure that this program can successfully obfuscate Unicode strings in Chinese/Russian/Arabic/Farsi?

My gut reaction: while code comments would be trivial to forge, it appears the us gov is still using foreign language strings in some way - maybe having just one string constant originally in a foreign language that is then obfuscated/scrambled (such as by xoring every char against a random key)

Just FYI. Those Chinese characters are really really really rarely used in any writings. In fact, anyone with Chinese reading compression will tell you those are gibberish words and none of the words make any sense.