Hacker News new | ask | show | jobs
by dekhn 2442 days ago
it's pretty simple. the physical data acquisition devices (ATLAS is an example) collect data at rates in the 100s of terabytes/sec https://home.cern/science/computing/processing-what-record)

No storage system can store that data (and most of it is not useful) so they have a series of hardware triggers and buffers that reduce the data down to roughly what modern (general purpose) hardware is capable of handling. They tune the thresholds to match what consumer hardware is capable of.

With regard to supercomputer filesystems: nobody wants to use GPFS. CERN's EOS sustained (theoretical) 3.3TB/sec in Apr 2015, so it's not like they're uncompetitive with the largest supercomputer...

1 comments

I know how data collection works, but it sounded as if 25GB/s was regarded as high compared with filesystems you can buy.

Obviously some people do want GPFS, if they can afford it, but Cori uses Lustre. I don't mean to claim that either is ideal for streaming high rate event data, of course.

> Obviously some people do want GPFS, if they can afford it, but Cori uses Lustre

Data model at CERN does not match the one of a supercomputer. CERN data are not processed locally but distributed and spread to ~100 of participating institute in the experiment.

Moreover, "personal opinion", GPFS is crap. It's an old relic from the 90s that has so many quirk and problem of design that it would deserves an entire conference on it. Plus the fact it's proprietary and expensive.

The only reason that make GPFS still alive is that for a long time, the only alternative was Lustre, and Lustre is even worst.

lustre is crap.

Every single supercomputer meeting I've been to (I've been part of the community for years, they often invite me to their meetings to give an industry perspective), people are just continuously complaining about the filesystems, and it's GPFS and Lustre at the top of the list.

What filesystems would they like to be using?