| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lokimedes 2895 days ago

We (Member of the ATLAS Experiment, 2008-2016) used Neural networks for trigger decisions (to record or ignore a collision) and Boosted Decision Trees were the big thing among many ATLAS physicists back around 2008-9, so that was also used quite a bit. For the experiments at the LHC you can consider the actual analysis of data a Multivariate Hypothesis Testing exercise. The thing is a counting experiment, you have a theory that provides a prediction, simulates its phenomenological effects, the reactions energy deposition in the detector(s), the electronic signal paths, and then we would run "reconstruction" basically turning electronic readouts into particle trajectories, particle energies and particle types. Under the laws of physics, these clues can be combined to measure the rest mass of initial particles that have long decayed (like the Higgs). Now given difficulty of separating the multiple particles leaving energy behind, it is quite difficult to separate the "background" from known physics from the interesting "signal" from a new theory under test. Rather than having to manually do the analysis, Machine Learning is applied at multiple stages. This can be to do particle identification (is it a muon, an electron or something else) or to maximise the binary seperation between two classes such as the signal/background.

In genereal it is wise to know that Machine Learning, Big Data and Cloud computing has been used in particle physics for decades, but with the LHC a world-wide infrastructure has been created to capture all the learnings that the mainstream are only beginning to discover now. For instance a main paradigm in the analysis model is to move calculation to the data, rather than the other way around, due to the large amounts of data. You may call it MapReduce, we call it physics analysis (Map you statistical analysis across decentralised data, reduce the output through distributed merge jobs, plot and publish). Sorry to sound like an old fart, you question is honest and relevant, but it really underlines how easy a story about how Google/Facebook/whatever invented something can rewrite history. Most of the stuff people in the IT/Tech sector are playing around with are inspired by basic or applied science, and applied in a commercial setting. This is exactly how it is supposed to work, but damned if the log analysing marketeers at Google should have all the credit for these developments :)

Now, with my rant over, here are a few references that may be interesting to you:

These were the tools used for physics analysis:

http://tmva.sourceforge.net https://root.cern.ch https://twiki.cern.ch/twiki/bin/view/RooStats/WebHome

And a few articles http://atlas.cern/search/node/Boosted%20Decision%20tree https://cds.cern.ch/search?ln=en&sc=1&p=Machine+Learning&act...

Oh and a bit of gossip. We called Sau Lan Wu the "Dragon lady" (mostly behind her back), because of her awesome energy and tenacity. She really deserves the credit given in the article!

2 comments

konschubert 2895 days ago

Sadly, the ML research in physics and in the rest of the world are extremely disconnected. To the point where one side has never heard of the tools that the other side uses all day.

I think the blame here is mostly on the science community which isn't paying much attention to ML tooling, best practices and research and instead keeps reinventing the wheel, over and over again.

guitarbill 2895 days ago

> I think the blame here is mostly on the science community which isn't paying much attention to ML tooling, best practices and research

While the science community doesn't have a great track record for quality software engineering, that's an awfully arrogant position.

Most ML tooling sucked, and it's only just getting better in terms of usability. But even then it's very software engineer-y in the worst kind of way, e.g. "Coming soon: PyTorch 1.0 ready for research and production" [0]. Great.

If you're doing research in one field, you don't really want to spend the time to become an expert in another one just to do some analysis. What you want is tools you can reliably (ab)use, like maths. But there often isn't a straight-forward way of getting the uncertainties on values output from many ML constructs.

Yes, the term "statistical learning" has been around since at least 2001. But it isn't widely known/talked about/understood, and most trendy ML "tutorials" gloss over it completely. Maybe this is unfair criticism. After all, most ML applications in software don't require that stricter treatment, and why should somebody playing around with ML be burdened with this rigorousness? At the same time, it's easy to come away from ML thinking "I don't understand this at all, it's a black box, it doesn't do what I need it to".

And we haven't even talked about what a pain reproducibility is in ML.

> instead keeps reinventing the wheel, over and over again.

If people keep reinventing it, maybe the problem isn't the people... yeah, physicists don't write great code (guilty), but ML tooling is full of hype and currently feels a bit Javascript-y.

[0] https://pytorch.org/2018/05/02/road-to-1.0.html

lokimedes 2895 days ago

That may have changed a bit now. I'm no longer part of the physics community, but it seems that the physicists at least are relying more and more on mainstream tools. The outward flow of knowledge comes more from ex-physicsts like myself, who work in industry. Most of us work in Tech, Fintech or other ML/Stats driven industries, where many reimplement what they have learned during their physics days.

dguest 2895 days ago

She's not talking about TMVA or RooStats when she says "Machine Learning": those would be "MVA" and "Statistics tools" in our jargon. She's talking about XGBoost[1].

[1]: https://xgboost.readthedocs.io/en/latest/