| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dekhn 1256 days ago
	oh. From what I can tell, the total world storage for non-human genome data is trivially small (a few petabytes and not growing rapidly). Human is huge- O(petabytes)/year for a single org is not out of the question.

2 comments

v8xi 1256 days ago

Thats true, but we do tremendous amounts of human DNA sequencing for certain causes at scale(e.g. understanding/treating cancer) whereas environmental sequencing is usually done to monitor/search for things at a much lower sample rate(e.g. disease load in wastewater, biodiversity from environmental samples, and looking for natural products produced by the zillions of bacteria/archaea in the oceans). From e.g. a wastewater sample perspective the latter type is going to be the majority of data, we just filter out the stuff of interest and analyze it in situ - but theres no reason to store 1B E coli genomes whereas this is necessary if we want to understand cancer evolution.

link

jefftk 1256 days ago

If you want to use untargeted metagenomics to detect novel human viruses you're going to be generating petabytes all by yourself: https://arxiv.org/pdf/2108.02678.pdf

link

dekhn 1256 days ago

I can't see any reason why you would need to save petabytes. Remember- at that scale, people think really hard about whether to pay the long-term storage and associated costs (the value of having this system should exceed its costs). The case for this already exists in (for example) cancer and other pharma.

link

jefftk 1256 days ago

The storage is massively cheaper than the sequencing. At some point it could be worth going back and trying to figure out how much of the raw data you can safely discarded, but at least at first there are so many more other things that are more urgent.

(The paper I linked describes more or less what I'm currently working on)

link