Hacker News new | ask | show | jobs
by sunwooz 4020 days ago
I'm sure someone here is already making a visualization with the dataset.
2 comments

At 12 TB I think just getting the files would take days of downloading.

I have 25 mbs which equals to 1041:40:00 (hh:mm:ss) That is 25 mbs perfect connection with no errors or drops. 1041 hours equals 43+ days.

http://www.numion.com/calculators/time.html

Those who deal with climate data are used to huge data sizes and are usually equipped with fast internet connections. You can also work with the data within AWS. The files come split in logical 750MB parts. There are different aspects of the data which come separately.
I have 150Mbps so could do this in ((12TB/18MB)/3600)/168 about 1.1 weeks.

I'm also supposedly completely unlimited on my package, I think that might push the limit though.

What I don't have is anything like 12TB of storage, all systems at home combined is maybe 2TB.

One could make the visualisation app work for a reduced data set and then constantly add to it whilst downloading. It would also make development easier. So you start with 2-3 years, make the app, make some scripts to automatically download and process what you need to process, and then add the other 147-148 years. This means... downloading only 164 - 245 GB of data... that'll keep me and my 10mbps broadband busy for more than 2 days :)
All of the ISPs where I live have a 150GB per month cap with $10 for every 50GB over, so this also means it would cost me an additional $2,500 in charges in addition to the download time. I don't think I'll be getting this anytime soon.
Would it go significantly quicker if you downloaded it to a VPS instead of your home?
That would help but the cost of storage for 11 TB and the up time to download would be extremely expensive. I haven't seen an option that works for less than around $200 to rent and setup.
It seems like if these large public datasets continue to come on line, there will need to be some sort of semi-cooperative distributed data store to make them truly "accessible." Or the data provider will need to provide an access/query API, rather than just a big tank that you can copy if you dare.
Bit Torrent might actually help as a tool to help distribute these large data sets.

The big issue is that so few people actually would need to download these large dataset.

But possibly many more people could use access to the data while they use the apps that were made possible by the data.
$200 isn't all that much, if it lets you get results while you're still alive. :)
Tell that to my wife :)
I sure hope so because I am very very curious. Surely there are a lot of researchers that will love to jump into this data and manipulate it somehow. It's a good time to be studying climate change now... your dissertation is right here :D
> I sure hope so because I am very very curious.

Indeed. The image at the top of the page (presumably the worst case scenario) has no legend. I'm guessing that the red is desert, but what counts as temperate? The South America blue or the South Africa yellow? Or are the colors a delta from the current temperatures?

It's amazing how easy it is to turn a 12TB deluge of pristine scientific data into something completely meaningless.

My guess is that it shows the average temperature rather than actual climate. So red would be high temperature, and blue cold. I don't think that it refers to precipitation or anything like that. Or it could be a stock photo...
You'd think some spots on the globe would get colder or rainer, no? Also this map makes it appear Antarctica will get colder? Or maybe that map is a stock image?
Assuming the data is accurate and hasn't been 'adjusted' or 'corrected.' The whole climate business is a corrupt mess. Raw data over the years has been 'fixed' and policy is being made from 'models.' Yet what non partisan organization checks the models? The IPCC certainly doesn't. It's all a scam. We had global cooling fears in the 1970s. Then warming. Now "climate change." Yet industrial carbon output has risen exponentially and consistently since the industrial revolution yet temperatures haven't, thus calling into question that increased carbon dioxide raises temperatures. Such a disgusting mess.
This is an interesting post, because so much of what you say is contradicted by basic fact. I'm wondering how you can possibly defend yourself?.

>We had global cooling fears in the 1970s.

No, we didn't. Climate scientists in the 1970s were predicting warming trends (the media just wasn't paying attention).

[0] http://skepticalscience.com/ice-age-predictions-in-1970s.htm

>Then warming. Now "climate change."

Nope, the two terms have been used in the scientific community for decades. Climate change has always been the more popular term in numerical analysis of the scientific literature.

[1] http://skepticalscience.com/climate-change-global-warming.ht...

>Yet industrial carbon output has risen exponentially and consistently since the industrial revolution yet temperatures haven't

This is factually inaccurate according to several datasets published by independent scientific organizations around the world. How many do you want? Let's start with NASA, Japan, and satellite data:

[2] http://data.giss.nasa.gov/gistemp/

[3] http://ds.data.jma.go.jp/tcc/tcc/products/gwp/temp/ann_wld.h...

[4] http://nsstc.uah.edu/climate/

How many more do you want? There's also ocean heat content:

[5] http://www.nodc.noaa.gov/OC5/3M_HEAT_CONTENT/