We've come a long way since the first IO result came out. Since then, we've gotten a couple more multilinear map candidates (though most are now broken), and some simpler constructions, but we're still really far from IO with a proof. This is primarily because of the underlying multilinear map that's being used. The Gentry et al result that proves IO secure in the generic multilinear model isn't that useful yet simply because there have been so many nongeneric attacks against mmap candidates, especially when they're used in IO. That is, at the moment there's no reason to believe that the generic multilinear model is even a good way to think about IO security.
What would be a really big result is finding IO that doesn't rely on multilinear maps.
The last time I saw this, the idea was to fully unwind a program into a huge, very deep, loop-free network of logic gates. After that, it was possible to create an larger network which produced the same outputs for the same inputs, but with internal structure obfuscated in a way that was theoretically difficult to undo. This was a interesting concept, but didn't immediately lead anywhere useful. Has there been progress?
Security is a double-edged sword. You can use it to protect against others, and others can also use it to protect against you. As computer security becomes stronger, I think it could be almost irresponsible to only mention the "good" uses - hackers accessing your bank account seems to be the cliche example - without also mentioning the malware-hiding, user-hostile, locked-down devices and DRM uses. After seeing what's happened to computing over the years, I'm starting to think that maybe such strong security is not good for society as a whole after all...
for malware hiding, you could avoid that by demanding that the programs contain mathematical proofs that they do not do a certain number of malicious things, or that they only do a particular non-malicious thing?
No you can't, because you can't define what is malicious.
Your partner going through your underwear drawer is perfectly fine. However if you see a stranger doing the same well, its about time to call the cops.
To expand on that: what the article is basically talking about is "DRM that actually works"—the ability to send someone some encrypted data embedded in a wrapper program. You can run the wrapper program and interrogate it all you like on its own terms—but other than satisfying the desire of the wrapper program's code-paths, there's no way to get decrypted data out.
If the data is dumb content like text, this amounts to regular DRM content encryption, except that there's no decryption key to be found in the wrapper program or anywhere else; the key is "baked into" the logic of the program in a non-recoverable way. (This would allow for things like "true" TPM chips, that can store your keys opaquely from forensic recovery.)
If, on the other hand, the data is itself a program for which the wrapper serves as an interpreter, this amounts to a mathematical basis for a real "Trusted Computing Base", enabling any manner of things, like simple distributed computation on untrusted hardware, or mathematically-strong anti-cheating protection for an MMO game, or satisfying cell carriers' desires for a protected "baseband processor" under their control without that needing to be instantiated as a physical chip.
Effectively, creating a wrapper VM (the "bootstrap program" in the article's terminology) would allow a processor to run a "binary" through the VM that is literally opaque to it; code that, even in its operation as instructions on the CPU, the CPU is incapable of comprehending or interfering with (beyond simply terminating/interrupting the wrapper VM, or restricting its hardware access.) Not only would the interpreted program's code itself be opaque; the working state—the contents of the wrapper program's memory (and the processor's registers, and whatever else) would be opaque. The only place you could see such a program's intent realized would be in the IO it does—and that might be just encrypted network traffic sent to peers, too.
Such a software process, if given a full CPU hypervisor slot rather than having to make system calls to an OS, would be for the first time a "first-class citizen" on a computer, functioning more like[1] a flashable FPGA coprocessor connected to the CPU than a series of instructions that the CPU can edit to its whims. The CPU could ignore such a coprocessor—choose to not interact with it or power it (not emulate it, in other words), or tell the IOMMU to remove the coprocessor's access to peripherals, etc. But the CPU couldn't reach inside the coprocessor to fiddle with it, even though it's a virtual coprocessor residing entirely within "the mind of" the CPU. [The CPU could arbitrarily corrupt the memory the coprocessor was using for its state—but with good encryption, that would just immediately crash the wrapper VM with an assertion failure, rather than leaking any info.]
---
[1] Note that this is just an analogy from the CPU's perspective; we already have flashable coprocessors, but that doesn't help us any, because while the CPU can't poke into them, people can. Indistinguishability Obfuscation means that we're in the position the CPU is in; we can no more see into the VM or its state than the CPU can reach over and take apart a coprocessor.
There are "good" and "bad" uses for this technology.
Bad uses: Netflix will put their video stuff into this and now you will never jack content from their software.
Good uses: Your IM and email can live in this and no compromise of your host operating system can leak your information. Your computer can be hacked by every hacker on Earth simultaneously and your secrets are safe.
The "real honest-to-god TPM" I was talking about? That would also be the basis for a DCCP-like system that actually worked indefinitely. Communications between your device and your display would be encrypted with keys that aren't extractable from the memory of either.
Of course, the signal is going to end up decrypted at the DAC interface, and you can always capture it there. But that doesn't give you the original encrypted data stream; it just gives you the result of applying the wrapper program to the encrypted data stream. Which might involve, say, per-customer watermarking, enabling them to very firmly trace the source of a given leak.
(And the watermark could be constructed so that anything that you could do to remove it would involve severely reducing the fidelity of the video. You might be able to restore the fidelity by gathering and averaging many different customers' streams, but not if the watermark involves "signal" rather than "noise"—for example, assuming a cartoon, realtime replacement of the patterns on a character's clothes with a catalogue of different textures, which would be made to average out to harsh static. Remember, your own computer is doing the processing to insert this stuff—it can afford to give you slow, individual attention, in a way that the provider's CDN servers just can't.)
You can't extract the keys, but you can recover a copy of the entire program, which lets you do the same thing: impersonate the 'TPM'. The only difference is that, as you say, any per-user transformation of the data could be done on the user's PC rather than in the cloud somewhere - but considering the orders of magnitude of overhead the obfuscation would probably impose even if the current research is vastly improved on (though I'm just guessing), I suspect it wouldn't be worth it in practice.
There will be no "easy" ways of doing that, ICE and initially ring0 debuggers will still be able to defeat that.
If the bits are in your hardware you'll be able to read them no matter how hard it might be it will still be possible.
A good example will be security tokens, there are quite a few labs that can extract the keys from most common security tokens through various means.
This process will cost you anywhere from couple of 1000's of dollars for tokens that are vulnerable to side-channel attacks to hundredths of thousands for tokens that require you to dismantle the IC and probe it directly but for the most part it's still possible.
I remember pretty clearly reading a comment by the author of the paper about the 'unbreakable obfuscation' in which he said that the paper was greatly misrepresented in that it had made a proof in a specific problem domain that wasn't so applicable to real software.
I'm pretty sure it was posted on HN at some point. I don't remember the term IO being used, so it may have been a different kind of obfuscation. There were some allusions made to an unsolvable jigsaw puzzle.
I'd like to note that IO does not give a guarantee of impossibility of extracting keys.
AFAIK, the definition of IO is: we have two programs that perform the same computation.
After we apply IO to both programs, we cannot figure out which obfuscated program corresponds to a particular original program.
However, there is a flaw: programs encrypting data with different keys are performing different computations.
So IO definition does not claim that IO is able to hide the key.
Your initial thought is why most people believed there to be little use in IO other than maybe removing software watermarks and the like. But this idea of a "punctured program" came around in which you can place the key in the program in a very clever way such that you get a security proof about hiding the key in the obfuscated program.
It turns out we can do just about anything in modern crypto using IO - it is an extremely powerful primitive - including symmetric encryption, public-key encryption, etc.
>> So IO definition does not claim that IO is able to hide the key.
From what I've read, that doesn't even matter. The obfuscated program IS effectively the key. A copy of that obfuscated program is still a copy of the key. It's still not clear to me what the advantage is supposed to be.
What would be a really big result is finding IO that doesn't rely on multilinear maps.