Hacker News new | ask | show | jobs
by dfranke 5685 days ago
I don't know enough about biology to comment on their storage-density claims, but as to the encryption, I'm getting a strong whiff of snake oil from "only the client would know the function to derive the checksum". If you want to convince me that bioinformatics has something to offer cryptology, then you need to explain to me what property wetware has that silicon doesn't which causes it to be unlike a classical Turing machine.
2 comments

The mass of a single base pair is about 1.08E-21 grams. That's 1.85E10^21 bits[1] of information in a single gram of purified DNA, about a forth of a zettabyte. So, if they're using DNA as the storage mechanism (the slideshow linked in the article indicates that they are), then 90GB is pretty insignificant. Sure, the bacteria will all be pretty filled with DNA, but it's not especially outlandish. Throwing compression at the information (as the slideshow discusses) makes it even less outlandish.

[1] Each base pair encodes two bits, as DNA and RNA is basically a base-four sequence (when we're thinking about it as data storage).

If we were to use yeast, we could additionally include methylation. We don't have to stop there: we could encode information in histone acetylation states, transcription levels, etc. to increase this density even further. Granted, how practical would that be?

Using the regulation machinery might be an interesting way to decrypt messages, if it were sufficiently complex a signalling pathway...

Yeast don't methylate.

Histone acetylation sites? Epigenetic information is too transient and is not necessarily passed down in a 1:1 fashion, which is really bad if you're trying to store data reliably.

Actually, knowing a 'good' (cryptographically) checksum function is equivalent to having a good encryption scheme. I believe it was Rivest who showed this, sometime in the late 1990s.

He suggested, for instance, blasting out a sequence of bits; if a block checksums to a certain number or matches a function, it has 'your' bits. An observer of the stream would see a bunch of random data. You would see: garbage-garbage-bits-garbage-garbage-garbage-garbage-bits-bits, etc.

This principle could work well in the system they describe.

You can do better than that. Run the hash in HMAC mode, hash successive counter values to get a pseudorandom stream of bits. Xor your plaintext with the stream to get the ciphertext.

But how does biology contribute to any of this? At best, they've taken a known cryptographic algorithm and figured out how to implement it with the computation done in wetware. At worst, and I suspect the worst, they've simply observed that some parameters of their encoding scheme are tunable, and claiming that you have a secure cryptosystem if you keep those parameters secret.