Hacker News new | ask | show | jobs
by epistasis 3886 days ago
The sequencers can stream to a data analysis center as the data is being generated.

It takes a 100mbit stream/$1M of sequencing capital, so network connectivity to transfer to a data center is a tiny tiny cost of the whole ordeal.

However, paying for AWS storage is pretty prohibitive, unless you're at a small scale. So big centers will build their own storage facilities.

The small data producers like the ones that the thread author talks about can use often use AWS more cost efficiently than building a compute cluster. However, they need to budget for that, which is not always thought of. They may also need to fight their institute's core center so that they can use DNANexus.

1 comments

S3 storage is pretty cheap, it's the data egress that really costs.

For academic centers though there is often an incentive to move things in house due to different treatment for capital expenditures and the opportunity to externalize some of your costs from your grant onto central services.

Data transfer is less than a single year of Glacier storage, so while it's pricy I wouldn't egress a major portion of the cost.

Keeping this data for less than 5-10 years is pretty questionable, since it's so expensive to generate. Eventually it may be cheaper to store the DNA and resequence when if it needs to be looked at again. However, if you're doing petabytes of storage, it's going to me much more economical to have your own storage and compute than to use AWS. Particularly at the rate that academic centers pay for sysadmins.

Running a public data portal our egress is higher than our storage costs. (We now proxy downloads through a direct connect to our university network...)

Remember to account for future reductions in storage costs. S3 has come down from $0.1500/GB month in 2010 to $0.0300/GB month today. And the recently introduced infrequent access storage tier is under half that again at $0.0125/GB month. It's now significantly cheaper to use S3/Azure/Google than running the storage ourselves.