| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AretNCarlsen 5452 days ago

I used to be the sysadmin for a high energy physics lab as we prepared for the ATLAS experiment to come online. (It was a long wait, following helium explosions and such.) The reason you see so many different numbers is that they cannot possibly record the full flow of information. CERN has a very large buffer that the collision sensor data is fed into initially, which is analyzed in realtime to determine which chunks of data are likely to contain significant information. Those chunks are kept, and the rest are discarded. This bothered a lot of people, since they are probably throwing away interesting scientific data, but they are limited by current storage technology.

Further preliminary analysis is performed on the retained data, broadly categorizing the energy and other characteristics of the collision. That allows individual physics groups around the world to download only the data that is likely to pertain to their specific research, e.g. the Higgs boson, multiple dimensions, etc.

There was some talk of transferring data via Bittorrent or perhaps a custom protocol involving fountain codes. That never got off the ground. Instead, the Russians were working on a custom peer-to-peer system with a monolithic centralized set of indices, a system which is hopefully working better than it used to.

P.S. - Here's a hummingbird-speed video of building our prototype fileserver node for local physics analysis of ATLAS data [before I learned about electric screwdrivers]: http://www.youtube.com/watch?v=8y6MpPNqxmw