Hacker News new | ask | show | jobs
by barrkel 4946 days ago
Python's bytecode is for a stack machine, if I'm not mistaken, and such bytecode is a serialization format for ASTs - a post-order traversal for expressions. Interpret stack machine bytecode symbolically and it reconstructs an AST:

Compilation:

  1 + 2 => (+ 1 2) => push 1, push 2, add
Interpretation:

  push 1 => 1
  push 2 => 2, 1
  add    => (+ 1 2)
Control flow makes things slightly more complicated, but not for predictable code generation.

Obfuscated bytecode which e.g. doesn't maintain consistent interpreter stack depths for every code path (illegal for JVM or .net CLR) would make things a little harder to analyze, but I doubt that's often the case in practice with Python.

1 comments

Yeah, the reconstruction isn't hard at all, but it's not a direct 1:1 mapping to the AST, since multiple control flow structures in the AST can become the same thing in bytecode. That said, it's quite simple to make it Good Enough (TM); the reason I wrote that and the RMarshal module was that I was writing a Python decompiler a part of a larger commercial project. I should release the decompiler at some point.