|
|
|
|
|
by jerf
1241 days ago
|
|
Correctness is a big deal here. This is a security context and we can assume that the obfuscators are active attackers against legible code, not just people passively hoping that their obfuscated code is obfuscated. If this becomes a popular technique, then code obfuscation tools will simply pivot to writing code that ChatGPT gets wrong when asked to unobfuscate it. I can't even imagine that would be a particularly hard thing to do, especially if it isn't correct even before actively attacking ChatGPT! Fooling it even harder won't be terribly difficult. This is advantage attacker overall. I imagine it would be as easy as using some cognitively loaded, but wrong, terms as variable names instead of short letters and numbers. Ask ChatGPT "please unobfuscate this network code" and get back a substring search algorithm because the network code was written with a dozen variants on "haystack" and "needle" for variable names, for instance. ChatGPT being actively wrong would be a step back for such deobfuscators then, not a positive at all. |
|
Such testing won't be able to prove that the two are equivalent (unless it's exhaustive) but with decent coverage of the original you can get some good confidence. The goal of deobfuscation is usually understanding, so I'm not sure you need strong guarantees of perfect semantic equivalence with no human intervention/judgment.
And of course, existing deobfuscators have bugs and aren't guaranteed to preserve semantics either.
[1] https://en.wikipedia.org/wiki/Symbolic_execution
[2] https://en.wikipedia.org/wiki/Differential_testing