Hacker News new | ask | show | jobs
by dekhn 3007 days ago
Oh, you mean BAM files? Get yourself a retention policy; you don't need to keep RNA BAM files that long.

I thought you meant derived data.

1 comments

I'm talking about the raw reads, which is important if you want to try a different alignment or base-calling method. You can debate how important it is to be able to do that, but I'm not trying to argue that the data should be kept, I was just explaining why the total size of publicly available RNA-seq data (the sum total of which the parent is attempting to organize) runs in the petabytes.
So, do you or the original poster actually have a materialized petabyte of RNA data? Otherwise, you're just describing a million files spread over a million file servers, not being used for science or processed in any way.