Hacker News new | ask | show | jobs
by tvdw 4946 days ago
For the record, distributing Python scripts doesn't have to mean distributing the source: it's possible to just execute the compiled .pyc files, which are harder to crack than (for example) Java's .class files.

Also, since xor is just a CPU instruction, you won't immediately notice it in the decompiled script (if you get that far). With all the overhead that decompilers tend to produce, it's really easy to miss.

3 comments

You should really take a look at one of these .pyc files. They are very very verbose to the point that they even contain local variable names and the python code can be trivially decompiled from the bytecode.

Literally the only thing that goes missing from .py to .pyc is comments.

That's what an assembly dump looks like to an experienced reverse engineer. Writing something in a "compiled language" because it's more secure is like XOR-ing your video with RANDOM_STRING and calling it DRM.

(Not that any DRM scheme can ever work, ever, but hey. At least some try to try.)

Can't work ever? Are you taking orders for hacked DirecTV cards? :)
I think you guys have two different definitions of "work". jrockway seems to be arguing from a technical perspective, but you're likely arguing from a practical perspective. Sure, DRM can "never work" in that there's always the analog loop and all that. But it's absolutely possible to make it so insanely complex and difficult that no one will ever break it; DirectTV has shown that angle works, without a doubt.
I think the distinction is between software and hardware DRM. DirecTV controls the entire hardware chain. This means they can do various proper encryption schemes (public/pre-shared key etc) that are actually near impossible to crack and make it really, really hard to obtain the key by making the key write-only in the crypto-chip.

In a pure software solution, you control the hardware, and any hiding of the key is subject to reverse engineering the software.

There's also a distinction between access to the data stream vs the ability to make a duplicate of it.

For all of the success they've had in protecting DirecTV, if you've got a legitimate access card feeding HDMI data out, you can make a perfect digital copy of the video stream that has no copy protection whatsoever. So ultimately the DRM offers no protection for the media content companies (at least those that don't benefit from live performances like say sports games), though it does for the pipe provider who will surely get his monthly satellite fees.

I agree with your comment, except that "difficult" doesn't necessarily imply "insanely complex".

Counter-example: we once timed a release of a very minor protection update to when the main attacker typically took a holiday. We got 6 weeks out of something trivial, buying more time to work on the major release to greet him when he returned.

It all comes down to economics. Buy a beater bike, and sure, you can secure it well enough. (Mean time between stolen is low enough you don't care too much.) Buy a really nice one that everyone wants, and good luck with that.

Popular, recently produced media has too much value to too many attackers to protect. A celebrity's self shots -- same thing. A game console by Microsoft or Sony -- same thing.

> But it's absolutely possible to make it so insanely complex and difficult that no one will ever break it

A more accurate way to put it: If you make the return on effort ratio low enough, the probability of someone breaking it goes down, and it might even go down enough for you to get away with it for a useful amount of time.

It's not that simple. Sure, unpopular systems are more obscure and less likely to attract attention, but you're wrong in extending that to "if it's popular, it will be broken" (denying the antecedent).

As a counter-example, I propose DirecTV or even their competitor, Dish Network (Nagravision). Hacks of these systems are worth 6 figures, pay TV is widely desired, and there hasn't been a DTV hack since 2004. None.

.pyc files are actually really easy to decompile, it's just that most people have never encountered the tools required to do it. I believe they literally contain the entire abstract syntax tree for the Python source code.
No, they contain marshaled bytecode. I documented the format at http://daeken.com/python-marshal-format a while back (should still be more or less correct).
Python's bytecode is for a stack machine, if I'm not mistaken, and such bytecode is a serialization format for ASTs - a post-order traversal for expressions. Interpret stack machine bytecode symbolically and it reconstructs an AST:

Compilation:

  1 + 2 => (+ 1 2) => push 1, push 2, add
Interpretation:

  push 1 => 1
  push 2 => 2, 1
  add    => (+ 1 2)
Control flow makes things slightly more complicated, but not for predictable code generation.

Obfuscated bytecode which e.g. doesn't maintain consistent interpreter stack depths for every code path (illegal for JVM or .net CLR) would make things a little harder to analyze, but I doubt that's often the case in practice with Python.

Yeah, the reconstruction isn't hard at all, but it's not a direct 1:1 mapping to the AST, since multiple control flow structures in the AST can become the same thing in bytecode. That said, it's quite simple to make it Good Enough (TM); the reason I wrote that and the RMarshal module was that I was writing a Python decompiler a part of a larger commercial project. I should release the decompiler at some point.
There's also https://github.com/gstarnberger/uncompyle which automates .pyc to .py "uncompyling"