| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by josemariaruiz 4940 days ago

Philipp K. Janert, author of «Data Analysis with Open Source Tools», spends a few pages for explaining how he perceives this «difference».

From his point of view Machine Learning is a fake science. Fragile, secretive and specific techniques for big problems that need secret parameters that have never been published for their application. This parameters will be supplied to you for a price by the inventors-researchers' companies.

In the other hand, statistics is real science, where everything is published and studied by a whole community. A science that has accumulated hundred of years of experience and that offer all its knowledge in any university. The methods offered by statistics are of broad application, robust and open.

And I think he has a point in this reasoning.

PD: Statistics works, ask in any hard sciences. Its contributions has been essential for the science in the last centuries. Machine Learning was bashed (like old AI) because it never offered real solutions or helped us to advance in our understanding of anything. Machine Learning is a tool, not a science, that tries to cope with the limitations of our knowledge, which means that it's a very convenient tool for engineers and problem solvers, as are numerical methods are, but it means too that its results share the problems of numerical methods.

4 comments

rm999 4939 days ago

I disagree with a few things about your comment. Your criticism of machine learning feels off-base, and is too specific to describe such a wild field. Who sells parameters? What does that even have to do with whether it is a science? I can think of other fields where every detail of an experiment isn't spelled out in every paper.

It's hard to separate machine learning and statistics because so much of machine learning derives directly from statistics. Motivation is probably the most important distinction; machine learning is applied statistics. I'd say it's a mix of science (the scientific method plays a big part in model building for example), engineering, and math. Statistics is first and foremost a branch of mathematics, not science; the scientific method does not play a role in the vast majority of the field.

link

alook 4939 days ago

> machine learning is applied statistics

It really is hard to separate ML & statistics - any competent practicioner of ML appreciates the statistical achievements that made Machine Learning methods possible. And statisticians must understand that to help automate decision-making systems, using learning methods/boosting is a viable option.

The debate around nomenclature (ML/stats/AI) seems limited to the academic community. Most data scientists I've met tend to accumulate a repertoire of tools from different fields, rather that side with either Machine Learning or Statistical communities.

link

east2west 4939 days ago

I don't see such a sharp divergence as a computer science major working in a top statistics department. Statistics does not magically work well in hard sciences and machine learning has worked very well in many fields. I don't know if you consider biology hard science but biostatistics has many pitfalls many sciences fall into, and highly regarded statisticians can come to opposite conclusions on same data.

Statistical theory is actually applied probability, often called mathematical statistics. Math majors would feel right at home. In fact math majors look down on statistics major for lack of rigor and they have a point. Two years of classes in a five years phd program isn't enough to perfect knowledge of measure theory. But the more important problem is that stats training leave out computing enough that it impacts works coming out of stats department.

This gets back to difference between machine learning and statistics. Machine learning research embraces all fields of engineering, approximation and estimation, numerical analysis, optimization, plus statistics. Since so much of scientific advances can be attributed to computational improvements, it is natural that the more computational oriented fields are ahead of less computational fields. LASSO has been all the rage in statistics recently when it largely relied on works in convex programming. And signaling processing community in EE and CS are leagues ahead of statisticians in the sophistication and scales of problems they can tackle. Computational statistics is an attempt to remedy computational shortcoming of traditional statistics, but we have yet to see visible impact in term of high-impact work from statistics departments.

Having said all that, computer science departments do have the problem of not fully understanding the statistical foundation of machine learning methods. But this is not the case for CS in prestigious schools such as Berkeley and Stanford and MIT. Work coming out of these places are theoretically sound yet application oriented. One needs to just look at NIPs papers to appreciate the breadth and depth of expertise available in machine community.

For a good reading on bridging statistics and machine learning, read paper by an inventor of random forest, Leo Breiman, "Statistical Modeling: The Two Cultures." It is a well regarded paper by a renown probabilist and statistician who cares about utility of statistics as used in the real world.

link

mjw 4939 days ago

> Fragile, secretive and specific techniques ... parameters will be supplied to you for a price by the inventors-researchers' companies

This seems a strange/dated/paranoid view of machine learning. Perhaps it has been true historically (have any references?) but doesn't ring true for me of the field these days as I've seen it.

Hyper-parameter selection can be tricky, and some papers do handwave about it when evaluating models, although this kind of flaw is increasingly picked up on by reviewers I think.

At any rate you'll find a lot of useful literature on hyper-parameter optimisation techniques especially for the more popular and general ML models. It's recognised as an important and interesting (albeit sometimes hard and fiddly) problem, not something to be swept under the rug, and not the stuff of conspiracies.

link

dbecker 4939 days ago

Most of this is true in a strict sense, and it's disappointing that it is presented in such a judgmental way.

Machine learning typically focuses on prediction. There are lots of business problems where prediction is the #1 goal, and ML is great in these circumstances.

Statistics typically focuses on understanding and summarizing data/findings. This is frequently closer to the needs of scientists.

Accepting that classical statistics has contributed more to science than machine learning doesn't make it better. That's like saying "A pipe-wrench is a better tool than a hammer, just ask any plumber."

The fields have a lot of similarities, but different use cases. Most claims that one is "better" come from a tight focus on specific use cases.

link