Hacker News new | ask | show | jobs
by 08-15 3569 days ago
The real problem isn't storing the data, it's accessing it. There is no way to address DNA, you can only "shotgun sequence" it. In doing so, you get random fragments of around 200 bases (400 bits). You can't get one such fragment, you get half a billion in one go, currently at a cost of around $5000. (Older, much more expensive technology, got up to 1000 bases... sometimes, and only 100 fragments per machine run.) So how are you going to access your archive? By sequencing the whole thing and (temporarily) storing it on a hard drive?

The manufacturers of modern sequencers (both Illumina and ABI) have been talking about this for at least 7 years (i.e. as long as they've been selling high throughput sequencers). They actually made a weaker claim: According to them, it makes no sense to keep a sequenced genome, because just sequencing it again would be cheaper than storing the data. In these 7 years, it hasn't happened. Instead, ABI's SOLiD technology all but vanished. Actually storing data in DNA is one step further, it's not going to happen for a long time.

(Source: My employer does a lot of sequencing. I talked to sales representatives of both companies, and I work on data sequenced using Illumina's machines. We store that data on spinning rust.)

2 comments

From what I get from my own research, the talk about HGP-write and a few chats with Nick Goldman (who is a very funny guy) himself, the main problem is neither storing nor accessing (which you can improve by probing and is also not that important as a primary application could be archives) but mostly synthesis which is still at minimum $1 per 10 bp.

And sequencing will become even cheaper when you do not do it from a library prep but in a controlled buffer environment. It is just currently not getting cheaper because there is no incentive for Illumina to do so (similar to Intels position in CPUs), lets hope that ONT, BGI and who ever else still hopes to get some market share (Ion Torrent, PacBio ...) can force them to evolve (project firefly, yeah).

Synthesis is dropping fast, and will drop even faster in the near future. There are a couple of 'humps' in the demand for synthesis. And plateaus in between. Synthesis between 0 and ~200bp gets you all you need for PCR (copy/paste). But if you can't do ~3000bp, you can't make a full-sized gene. So people get used to PCRing everything. And there is simply no proper demand for anything larger.

But with a few new players on the block (Twist, Gen9, and a few other smaller/newer startups), the goal is to hit economical ~2-3kb, at which point the race is back on again, and whole new markets will open up. And the moment that happens, expect the price to drop again. Competition will kick back in and everyone's price will drop.

The size of a moderate plasmid (~5-7,000) is another hurdle, and the size of a small chromosome is another (~100,000).

Also, if you're ordering DNA in pools or bulk (have a good compression algorithm), you can get the price/bp to come down even more.

There are many ways to address DNA for sequencing that doesn't involve shotgunning.