Hacker News new | ask | show | jobs
by captainmuon 3261 days ago
This looks interesting, but I wonder how safe it is in the stated use case of journalists, activists in an authoritarian country. It can use Tor, which hides whom you are communicating with, but the fact that you are using Tor sticks out like a red thumb.

The authorities probably just have to flip a switch to put you under closer surveillance if they see you use Tor. Or they'll just send someone to your registered address and see whats going on.

What I really think would be cool would be a protocol based on massive steganography and obfuscation. You would have kernels which tell it how to wrap data in an innocent looking container (HTTPS traffic, SMT, IRC, Cat pictures and recipies over plain HTTP, DNS, ICMP pings, ...). Ideally, you would have dozens. And they would be shareable between nodes. You could define them in a DSL, and make them sandboxed and provable (that they round-trip, i.e. can decode what they encode, and terminate properly - that restricts what you can do in them though). You could even autogenerate the kernels. The last two points would require a bit of R&D of course.

The goal would be to be able to create new "protocols" faster than authorities can learn to detect them. Then wrap a regular encrypted protocol in this obfuscation layer.

4 comments

What you're describing already exists, obfsproxy. Unless I'm misunderstanding your proposal.
Briar is designed with a distinct transport layer so it is entirely possible to support this type of thing.
Maybe I'm missing something, but if the protocol were well defined and open source, it would be trivial to detect, no?
Not really, the idea would be to hide data by using different amounts of spaces in text files, in the least significant bits of pixels in images, or in the access pattern to a certain service. The data looks like legitimate traffic. You could run the tool on absolutely all traffic, but that would be computationally intensive. And the data you get out is still encrypted, so ideally you can't tell if it is random (from extracting data where none is hidden) or real encrypted data.

Also, you would have dozens or hundreds of kernels, and you could generate them by analyzing innocent traffic, or hiring a bunch of students to write them quickly. My idea is that the kernels are not part of the source code per se, but rather distributed by the protocol. To contact somebody you need to speak a common kernel, but then they can send you new kernels automatically. You could come up with a measure of how well kernels survive censorship and use that to decide which to pass on.

It's a bit like auto updating malware, but for good :-). My only novel idea is to make a DSL or bytecode for the kernels, so that you can prove that they are benign and correct, and autogenerate them or use kernels from strangers. I don't know at all if this is feasible or not, but I have a couple of ideas how to make it work. No where near a POC yet so this is all still wishful thinking though.

"the idea would be to hide data by using different amounts of spaces in text files, in the least significant bits of pixels in images, or in the access pattern to a certain service" is not appropriate for the claimed use case, i.e. activists in totalitarian regimes.

In such an environment, the traffic of suspected activists will be analyzed.

Assuming the kernels are open, it's possible to see in analysis of certain data that "amounts of spaces in text files, in the least significant bits of pixels in images, or in the access pattern to a certain service" have encoded information, even if the extracted information looks like random/encrypted data. At this point you don't have plausible deniability and rubber hose cryptoanalysis can be used.

Switching to new kernels happens too late since you don't know when they've identified a kernel until they start arresting people - it's not like they're simply going to block it immediately.

i.e., the described service is resistant to mass censorship and automated filtering, but these use cases actually need to be able to resist attribution and retaliation, which are quite different problems.

the code is to find, analyze it