Hacker News new | ask | show | jobs
by matharmin 3376 days ago
This is about Dalvik bytecode format, but the same applies to standard Java bytebode files. Practically any obfuscated Java code will have this, which makes reverse engineering much more difficult without the tools to handle it.
2 comments

Why is it not as simple as mapping the method call instructions to returntype_methodname format? Am I missing something?
What if the call site does not use the returned value?
The bytecode encodes that information, the method identifier includes name, parameter types and return type.
It's still present in the bytecode (since the VM has to know which method to call).
Yes. Because of overloading there may be 2 functions with the same name and the same return type but different arguments. So you need to include them too.
No. Java itself handles the differing arguments.
Well one obfuscation technique is to change the argument types. So for instance, you might have foo(string). But after obfuscation, you move the code of foo(string) into foo(object). So a decompiler ends up showing:

  String s = "123";
  foo(s);
Java will call foo(string) if this is re-compiled, which is wrong. The decompiler would need to show foo((Object)s).

On the CLR you can go even further and erase a lot of the type info and just pass objects around. That is, you can just change all your method signatures to foo(object, object, object) no matter which classes (not structs) you pass. Or have even more fun and randomize the classes. So now foo(string) becomes foo(DBConnectionInfo).

For that specific example that would actually be fine in Java. Everything is an Object, so even if you have foo(Object) you can just pass it a String and that is then fine both for the JVM and the Java compiler.

Since the JVM identifies methods by their name + argument types this would most likely break overloading unless done very carefully. I've decompiled quite a few Java applications and haven't seen anything like this either, but perhaps I've just had the luck to not yet run into it.

It wouldn't be fine. It'd call the wrong method. The bytecode invokes the foo(object) one, but a naive decompiling would invoke foo(string). A cast would be required to get it to resolve correctly. Or am I misunderstanding?
Very interesting. Technically this should work in Dalvik opcodes too. But I do not know if there any runtime checks done by VM. I have never seen such technique used in Android.
Do you have some examples for tools to handle the deobfuscation stuff?