Hacker News new | ask | show | jobs
by baldfat 4022 days ago
At 12 TB I think just getting the files would take days of downloading.

I have 25 mbs which equals to 1041:40:00 (hh:mm:ss) That is 25 mbs perfect connection with no errors or drops. 1041 hours equals 43+ days.

http://www.numion.com/calculators/time.html

5 comments

Those who deal with climate data are used to huge data sizes and are usually equipped with fast internet connections. You can also work with the data within AWS. The files come split in logical 750MB parts. There are different aspects of the data which come separately.
I have 150Mbps so could do this in ((12TB/18MB)/3600)/168 about 1.1 weeks.

I'm also supposedly completely unlimited on my package, I think that might push the limit though.

What I don't have is anything like 12TB of storage, all systems at home combined is maybe 2TB.

One could make the visualisation app work for a reduced data set and then constantly add to it whilst downloading. It would also make development easier. So you start with 2-3 years, make the app, make some scripts to automatically download and process what you need to process, and then add the other 147-148 years. This means... downloading only 164 - 245 GB of data... that'll keep me and my 10mbps broadband busy for more than 2 days :)
All of the ISPs where I live have a 150GB per month cap with $10 for every 50GB over, so this also means it would cost me an additional $2,500 in charges in addition to the download time. I don't think I'll be getting this anytime soon.
Would it go significantly quicker if you downloaded it to a VPS instead of your home?
That would help but the cost of storage for 11 TB and the up time to download would be extremely expensive. I haven't seen an option that works for less than around $200 to rent and setup.
It seems like if these large public datasets continue to come on line, there will need to be some sort of semi-cooperative distributed data store to make them truly "accessible." Or the data provider will need to provide an access/query API, rather than just a big tank that you can copy if you dare.
Bit Torrent might actually help as a tool to help distribute these large data sets.

The big issue is that so few people actually would need to download these large dataset.

But possibly many more people could use access to the data while they use the apps that were made possible by the data.
$200 isn't all that much, if it lets you get results while you're still alive. :)
Tell that to my wife :)