| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by karies 3716 days ago

Background upfront: I'm the guy behind the C++ interpreter and ROOT's new interfaces. I'm the co-author of the only surviving C++ reflection proposal and the author of the std::variant proposal. I have contributed to the C++ Core Guidelines (http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines https://youtu.be/1OEu9C51K2A).

* HEP stores about 0.5 exabytes of data in ROOT format, that's almost exclusively serialized objects that do not know anything about TObject.

* XRootD is not really specific for ROOT files. A better example would maybe be our JavaScript de-serialization library, https://root.cern.ch/js/

* No way will the python binding be dropped. I wonder where you got that rumor from. About one third of our users is using it.

* HEP is limited by CPU resources, which is part of the reason why HEP decided to use a close-to-bare-metal language for the number crunching part.

* We just made the use of python and R multivariate analysis tools with ROOT data more straightforward.

* We have people from genomics etc coming to ask for help, because they cannot find a system that scales as well as ROOT does.

And then we have a different perception of the direction out there. I see that Hadoop was nice but slow, Spark is nice but slow, so now things are moving to C++, see e.g. ScyllaDB. There is no reason for us to move away from it, but every reason to make it more usable.

And yes, I agree that this is an issue. But many physicists do not.

5 comments

batbomb 3715 days ago

* ROOT files still have terrible documentation. Rene throws up his arms in protest anytime people say this (I've personally witnessed this)

* Physicists still don't like pyroot interfaces, otherwise rootpy wouldn't exist.

* astropy is proof that you can be performant and user friendly. Julia is proof that you don't even need a C++ library underneath.

* Saying ROOT scales well is weird; It is true that ROOT and the ROOT IO/ROOT files are efficient, but it needs but additional services have helped it scale (dCache, XRootD, batch farm/grid/DIRAC, etc...)

* Not sure what the ScyllaDB tangent has to do with anything. There are scalable open source RDBMS options out there too like CitusDB, Greenplum which support UDFs. Hadoop and Spark with HDFS are still great for certain applications, and as general data analysis tools are great, but it's tricky to really get them to perform well without HDFS and the grid model of computing doesn't lend itself well to that paradigm.

* I've heard the C++ interpreter is much better with Cling (if that's you, I applaud your effort!) CINT was a gun that fired in both directions for every grad student I ever had to help.

* XRootD has little to do with ROOT anymore other than it also implements the original root protocol.

* ROOT is not modular. It is both an application and a collection of libraries and somewhat of a VM. That does make some things convenient, but it also makes some things extremely hard.

There are many reasons to move away from ROOT, and the astrophysics community is a prime example of that!

link

alxprc 3716 days ago

Thanks for clarifying. You're right that I was too broad, and it's certainly true that many physicists don't share my opinion (I'm working on that).

Speed is always a concern, but I don't think it dictates that C++ should be the primary ‘user-facing’ interface. Numpy is fast, but it doesn't sacrifice a nice API to achieve it.

Personally, a big difference is that a lot of the Python packages feel fast to use and, most importantly, to write. ROOT can be fast to execute, no question, but I feel like I'm fighting against it (and I'm sorry that's very vague and qualitative).

It would be very interesting to hear more about the genomics use-case, and how they evaluated the other options.

link

whyever 3716 days ago

I'm using Python for analysis, and I'm running into performance issues constantly.

link

pwang 3715 days ago

If you want easy scale-out and scale-up with Python, check out the (relatively) new library Dask: http://dask.readthedocs.org

link

konschubert 3714 days ago

The thing that bothers me most about root is that some parts of it are basically not maintained at all.

There are serious bugs in RooFit which haven't been fixed in years. Wouter Verkerke has abandoned it (from what I can tell). Lorenzo Moneta is fixing the worst potholes, but it seems is has no authority or no time to tackle the misguiding interface and the broken scaffolding of RooFit.

Maybe ROOT7 will be a chance to take ownership of RooFit again.

link

jhbadger 3716 days ago

Have there been any success stories in regard to genomics and ROOT? About 10-15 years ago the group I was with then explored ROOT as the alternatives (Perl, early versions of R, etc.) weren't very attractive. We didn't end up going with ROOT ourselves for a variety of reasons, but did anyone else in the field do so?

link