Hacker News new | ask | show | jobs
by santaragolabs 3380 days ago
Yep and they change things around every once in a while too. I RE'd dropbox several times using several different techniques. I just checked my old tarball containing a script which downloads dropbox binary, downloads the Python interpreter, builds the opcode table database and then decompiles everything.

  gvb@santarago:/tmp/lookinsidethebox$ ./run.sh
  fetched all dependencies..lets try decompiling
  no saved opcode mapping found; try to generate it
  pass one: automatically generate opcode mapping
  108 opcodes
  pass two: decrypt files, patch bytecode and decompile
  1928/1928
  successfully decrypted and decompiled: 1727 files
  error while decrypting: 0 files
  error while decompiling: 196 files
  opcode misses: 7 total 0x6c (108) [#9],  0x2c (44) [#14],  0x8d (141) [#15],  0x2e (46) [#1],  0x2d (45) [#14],  0x30 (48) [#5],  0x71 (113) [#11783],  
A starting point to do this yourself is: https://github.com/rumpeltux/dropboxdec. After unmarshalling the new pyc files the seed read in via the rng() function is in newer Dropbox installations passed through a Mersenne twister from which 4 DWORD values are being read which are then used to construct the key for the Tiny Encryption Algorithm cipher.

After that you get the binary blob back which you can unmarshall now. But you still need to figure out the opcode mapping. For that I used a trick publicly first done (to the best of my knowledge) by the author of PyREtic (Rich Smith) released at BH 2010. He just compares the stdlib pyc files with the stdlib included within dropbox (after decrypting those pyc files) byte by byte. That should yield a mapping of opcodes.

Then pass everything through uncompyle2 and you've got pretty readable source code back. Some files will refuse to decompile but that means hand-editing / fine-tuning the last bits of your opcode table a bit.

EDIT: follow-up on parent comment; the encryption keys are not in the interpreter. The interpreter is patched to not expose co_code and more (to make this memory dumping more difficult; injecting an shared object is a different technique that I used too). It's also patched to use the different opcode mapping and the unmarshalling of pyc files upon loading them. However the key for each pyc file is derived from data strictly in those files themselves. It's pretty clear when you load up the binary in IDA Pro and compare the unmarshalling code with a standard Python interpreter's code

1 comments

I thought that they would have added more obfuscation after that paper was published. Since the opcode generation and parsing is reset, per pyc file, I expected to see stuff like:

- A rotating opcode table that changes every X opcodes

- Multiple opcodes that referenced the same operation, selected randomly at generation time.