Hacker News new | ask | show | jobs
by 0x0 3938 days ago
Is this about data encryption, or is it about hiding the inner workings of an executable binary (malware packing/copy protection)?
2 comments

The last time I saw this, the idea was to fully unwind a program into a huge, very deep, loop-free network of logic gates. After that, it was possible to create an larger network which produced the same outputs for the same inputs, but with internal structure obfuscated in a way that was theoretically difficult to undo. This was a interesting concept, but didn't immediately lead anywhere useful. Has there been progress?
what's the difference? ;)
The short answer is "only in how it is used".

Security is a double-edged sword. You can use it to protect against others, and others can also use it to protect against you. As computer security becomes stronger, I think it could be almost irresponsible to only mention the "good" uses - hackers accessing your bank account seems to be the cliche example - without also mentioning the malware-hiding, user-hostile, locked-down devices and DRM uses. After seeing what's happened to computing over the years, I'm starting to think that maybe such strong security is not good for society as a whole after all...

for malware hiding, you could avoid that by demanding that the programs contain mathematical proofs that they do not do a certain number of malicious things, or that they only do a particular non-malicious thing?
No you can't, because you can't define what is malicious.

Your partner going through your underwear drawer is perfectly fine. However if you see a stranger doing the same well, its about time to call the cops.

To expand on that: what the article is basically talking about is "DRM that actually works"—the ability to send someone some encrypted data embedded in a wrapper program. You can run the wrapper program and interrogate it all you like on its own terms—but other than satisfying the desire of the wrapper program's code-paths, there's no way to get decrypted data out.

If the data is dumb content like text, this amounts to regular DRM content encryption, except that there's no decryption key to be found in the wrapper program or anywhere else; the key is "baked into" the logic of the program in a non-recoverable way. (This would allow for things like "true" TPM chips, that can store your keys opaquely from forensic recovery.)

If, on the other hand, the data is itself a program for which the wrapper serves as an interpreter, this amounts to a mathematical basis for a real "Trusted Computing Base", enabling any manner of things, like simple distributed computation on untrusted hardware, or mathematically-strong anti-cheating protection for an MMO game, or satisfying cell carriers' desires for a protected "baseband processor" under their control without that needing to be instantiated as a physical chip.

Effectively, creating a wrapper VM (the "bootstrap program" in the article's terminology) would allow a processor to run a "binary" through the VM that is literally opaque to it; code that, even in its operation as instructions on the CPU, the CPU is incapable of comprehending or interfering with (beyond simply terminating/interrupting the wrapper VM, or restricting its hardware access.) Not only would the interpreted program's code itself be opaque; the working state—the contents of the wrapper program's memory (and the processor's registers, and whatever else) would be opaque. The only place you could see such a program's intent realized would be in the IO it does—and that might be just encrypted network traffic sent to peers, too.

Such a software process, if given a full CPU hypervisor slot rather than having to make system calls to an OS, would be for the first time a "first-class citizen" on a computer, functioning more like[1] a flashable FPGA coprocessor connected to the CPU than a series of instructions that the CPU can edit to its whims. The CPU could ignore such a coprocessor—choose to not interact with it or power it (not emulate it, in other words), or tell the IOMMU to remove the coprocessor's access to peripherals, etc. But the CPU couldn't reach inside the coprocessor to fiddle with it, even though it's a virtual coprocessor residing entirely within "the mind of" the CPU. [The CPU could arbitrarily corrupt the memory the coprocessor was using for its state—but with good encryption, that would just immediately crash the wrapper VM with an assertion failure, rather than leaking any info.]

---

[1] Note that this is just an analogy from the CPU's perspective; we already have flashable coprocessors, but that doesn't help us any, because while the CPU can't poke into them, people can. Indistinguishability Obfuscation means that we're in the position the CPU is in; we can no more see into the VM or its state than the CPU can reach over and take apart a coprocessor.

There are "good" and "bad" uses for this technology.

Bad uses: Netflix will put their video stuff into this and now you will never jack content from their software.

Good uses: Your IM and email can live in this and no compromise of your host operating system can leak your information. Your computer can be hacked by every hacker on Earth simultaneously and your secrets are safe.

> Netflix will put their video stuff into this and now you will never jack content from their software.

Yep. It is good I cannot record what is on the screen. Or in the video buffer.

It still only takes 1 person with legitimate access to share it. DRM is always defeatable.
The "real honest-to-god TPM" I was talking about? That would also be the basis for a DCCP-like system that actually worked indefinitely. Communications between your device and your display would be encrypted with keys that aren't extractable from the memory of either.

Of course, the signal is going to end up decrypted at the DAC interface, and you can always capture it there. But that doesn't give you the original encrypted data stream; it just gives you the result of applying the wrapper program to the encrypted data stream. Which might involve, say, per-customer watermarking, enabling them to very firmly trace the source of a given leak.

(And the watermark could be constructed so that anything that you could do to remove it would involve severely reducing the fidelity of the video. You might be able to restore the fidelity by gathering and averaging many different customers' streams, but not if the watermark involves "signal" rather than "noise"—for example, assuming a cartoon, realtime replacement of the patterns on a character's clothes with a catalogue of different textures, which would be made to average out to harsh static. Remember, your own computer is doing the processing to insert this stuff—it can afford to give you slow, individual attention, in a way that the provider's CDN servers just can't.)

You can't extract the keys, but you can recover a copy of the entire program, which lets you do the same thing: impersonate the 'TPM'. The only difference is that, as you say, any per-user transformation of the data could be done on the user's PC rather than in the cloud somewhere - but considering the orders of magnitude of overhead the obfuscation would probably impose even if the current research is vastly improved on (though I'm just guessing), I suspect it wouldn't be worth it in practice.
So then you use two unique captures of the same material to find the deltas and munge away.
There will be no "easy" ways of doing that, ICE and initially ring0 debuggers will still be able to defeat that.

If the bits are in your hardware you'll be able to read them no matter how hard it might be it will still be possible.

A good example will be security tokens, there are quite a few labs that can extract the keys from most common security tokens through various means. This process will cost you anywhere from couple of 1000's of dollars for tokens that are vulnerable to side-channel attacks to hundredths of thousands for tokens that require you to dismantle the IC and probe it directly but for the most part it's still possible.