Hacker News new | ask | show | jobs
by stephensonsco 2985 days ago
So you think the LHC should "publish" 100 petabytes of data?

What you are stating isn't practical because of cost.

But I'll definitely go with the idea that "if you want to make a claim as big as finding dark matter and be believed, then releasing your data is probably a good idea".

2 comments

http://opendata.cern.ch

> "Explore more than 1 petabyte of LHC data!"

And that's just the online free-for-all version. If you apply for access, you can analyse all of it through the LHC Computing Grid.

This is awesome. But it is still impractical. The researchers knee deep in this already have to fight for the computing power and data access (and error checking of all of that) for years to get anything meaningful to come out. It's just not as simple as "publish the data". I'm willing to be proved wrong on this, but it's a lot like saying: here's an iOS IDE and a server, recreate Whatsapp. It's impractical for many reasons, but I'll definitely give that with enough resources (money/time/brainpower/data access) it is not impossible.
It's worse ! The data for the Higgs can't be published because... it was destroyed!

The aggregate, processed signals are all that is retained in the LHC, the raw data was gone before it could be analysed.

Also they used a bloody hokey boosted classifier for the detection but that's bye the bye now apparently. And there were 12 events out of about 1 trillion, so all good there too...

That is not true, at least for CMS. All the RAW data taken in pp collisions during 2011 and 2012 - ie the Higgs boson discovery dataset - has been saved in tape. As a general rule, we never delete RAW collision data that actually make it into permanent storage. Of course, data that didn't pass the real-time selection to be recorded in the first place is irrevocably lost.
So all the data apart from the data that was thrown away?
Also very true. Unfortunately this data is dirty, context dependent, or plain missing. This is true of pretty much any "real" experiment.