| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alxprc 3714 days ago

I am a particle physicist, and used to use ROOT every working day. It is still used daily by thousands of other particle physicists, though, and is a core part of many high-energy physics experiments.

I think there are a few of objectively neat features of ROOT:

* Versioned persistency of C++ objects deriving from the TObject base class [1];

* Script-like execution of C++ and a C++ REPL based on clang [2]; and

* Dynamic bindings of the C++ classes to Python [3].

There's an accompanying, but independently developed, file access protocol for reading and writing ROOT files over a network, too [4].

On the other (subjective) hand, ROOT is regarded a pain to use by ‘analysts’, the people who use ROOT to make the results that go in to physics papers. There are already some good, old-but-still-valid critiques [5, 6], so I won't say too much, but I think a large part of the problem comes from two things:

1. ROOT tries its best to do everything that a particle physicist might want to do. This encompasses a very wide range of things, and this has lead to ROOT having a very large, often intractable codebase that cannot be modularised.

2. It has failed to keep up with contemporary coding techniques and analysis methods. Most of the PhD students I know use the Python interface to ROOT, and yet the ROOT developers are planning to drop Python support for the next major version (ROOT 7, which is expected in 2018). Those that do use C++ aren't able to use even C++11 effectively with ROOT, as its interfaces aren't compatible.

Luckily, I'm confident that analysts will move to a better way. I've been very encouraged by the astrophysics and machine learning communities in particular, who are using Python to do low- and high-level analysis on large datasets, as we do in particle physics, and are producing fantastic results. Tools like pandas, matplotlib, and scikit-learn are an absolute joy to use in comparison with ROOT, and the communities within the Python ecosystem are wonderful: they foster very open code development, and value readable, well-documented, fast code.

I don't need ROOT to get any better, because I think the future is already here.

[1]: https://root.cern.ch/root/html534/guides/users-guide/InputOu...

[2]: https://root.cern.ch/cint-prompt

[3]: https://root.cern.ch/pyroot

[4]: http://xrootd.org

[5]: http://www.insectnation.org/articles/problems-with-root.html

[6]: http://www.insectnation.org/articles/root-wishlist.html

1 comments

karies 3714 days ago

Background upfront: I'm the guy behind the C++ interpreter and ROOT's new interfaces. I'm the co-author of the only surviving C++ reflection proposal and the author of the std::variant proposal. I have contributed to the C++ Core Guidelines (http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines https://youtu.be/1OEu9C51K2A).

* HEP stores about 0.5 exabytes of data in ROOT format, that's almost exclusively serialized objects that do not know anything about TObject.

* XRootD is not really specific for ROOT files. A better example would maybe be our JavaScript de-serialization library, https://root.cern.ch/js/

* No way will the python binding be dropped. I wonder where you got that rumor from. About one third of our users is using it.

* HEP is limited by CPU resources, which is part of the reason why HEP decided to use a close-to-bare-metal language for the number crunching part.

* We just made the use of python and R multivariate analysis tools with ROOT data more straightforward.

* We have people from genomics etc coming to ask for help, because they cannot find a system that scales as well as ROOT does.

And then we have a different perception of the direction out there. I see that Hadoop was nice but slow, Spark is nice but slow, so now things are moving to C++, see e.g. ScyllaDB. There is no reason for us to move away from it, but every reason to make it more usable.

And yes, I agree that this is an issue. But many physicists do not.

link

batbomb 3714 days ago

* ROOT files still have terrible documentation. Rene throws up his arms in protest anytime people say this (I've personally witnessed this)

* Physicists still don't like pyroot interfaces, otherwise rootpy wouldn't exist.

* astropy is proof that you can be performant and user friendly. Julia is proof that you don't even need a C++ library underneath.

* Saying ROOT scales well is weird; It is true that ROOT and the ROOT IO/ROOT files are efficient, but it needs but additional services have helped it scale (dCache, XRootD, batch farm/grid/DIRAC, etc...)

* Not sure what the ScyllaDB tangent has to do with anything. There are scalable open source RDBMS options out there too like CitusDB, Greenplum which support UDFs. Hadoop and Spark with HDFS are still great for certain applications, and as general data analysis tools are great, but it's tricky to really get them to perform well without HDFS and the grid model of computing doesn't lend itself well to that paradigm.

* I've heard the C++ interpreter is much better with Cling (if that's you, I applaud your effort!) CINT was a gun that fired in both directions for every grad student I ever had to help.

* XRootD has little to do with ROOT anymore other than it also implements the original root protocol.

* ROOT is not modular. It is both an application and a collection of libraries and somewhat of a VM. That does make some things convenient, but it also makes some things extremely hard.

There are many reasons to move away from ROOT, and the astrophysics community is a prime example of that!

link

alxprc 3714 days ago

Thanks for clarifying. You're right that I was too broad, and it's certainly true that many physicists don't share my opinion (I'm working on that).

Speed is always a concern, but I don't think it dictates that C++ should be the primary ‘user-facing’ interface. Numpy is fast, but it doesn't sacrifice a nice API to achieve it.

Personally, a big difference is that a lot of the Python packages feel fast to use and, most importantly, to write. ROOT can be fast to execute, no question, but I feel like I'm fighting against it (and I'm sorry that's very vague and qualitative).

It would be very interesting to hear more about the genomics use-case, and how they evaluated the other options.

link

whyever 3714 days ago

I'm using Python for analysis, and I'm running into performance issues constantly.

link

pwang 3713 days ago

If you want easy scale-out and scale-up with Python, check out the (relatively) new library Dask: http://dask.readthedocs.org

link

konschubert 3712 days ago

The thing that bothers me most about root is that some parts of it are basically not maintained at all.

There are serious bugs in RooFit which haven't been fixed in years. Wouter Verkerke has abandoned it (from what I can tell). Lorenzo Moneta is fixing the worst potholes, but it seems is has no authority or no time to tackle the misguiding interface and the broken scaffolding of RooFit.

Maybe ROOT7 will be a chance to take ownership of RooFit again.

link

jhbadger 3714 days ago

Have there been any success stories in regard to genomics and ROOT? About 10-15 years ago the group I was with then explored ROOT as the alternatives (Perl, early versions of R, etc.) weren't very attractive. We didn't end up going with ROOT ourselves for a variety of reasons, but did anyone else in the field do so?

link