Hacker News new | ask | show | jobs
by wish5031 2491 days ago
At the beginning the authors claim they're not just making a restatement of the Stone-Weierstrass theorem (any continuous function on a compact set can be approximated arbitrarily well by a polynomial), but reading through, I'm not sure what they're proving besides that. In fact, on pages 6-7 they simply appeal to that theorem before stating "NNs can loosely be viewed as a form of polynomial regression". Most of the rest of the paper is comparing polyreg and NNs on various datasets. Sec. 9 doesn't have anything doesn't have anything especially novel in it. For instance, in 9.3 they say they'll explore the causes of overfitting in the context of their "NN <-> PR" principle, but never actually do so...

Polynomial regression is nice because it's a little easier to interpret, and it's also a convex problem with a single, global minimum. OTOH you have to design features yourself, otherwise for high-dimensional problems polyreg quickly requires way too much memory and compute to solve in a reasonable amount of time.

This paper might have been more interesting if it had somehow connected neural nets to those facts, or if it had shown how insight on the way in which neural nets work can be used to improve polyreg. But (admittedly, from a brief reading) I don't anything like that here.

2 comments

I agree. All I see here is: "We know any reasonably smooth function can be approximated by a neural net, likewise it can be approximated by a polynomial. So they're the same!" Imagine how it's going to blow their minds when they find out about Fourier transforms or wavelets!

There are lots of ways to approximate functions; the property of NNs that make them attractive for ML isn't the universal approximation theorem. It's that there's a fast, robust method of training then that's easy to implement, easy to vectorize, easy to parallelize, and easy to customize for different applications.

Isn't it rather the other way around, that regression methods informed the development of NNs?

I suppose the finding isn't novel. The contribution here might still be a didactic approach, demystifying NNs by simplifying a fundamental notion in known terms. For example exposing implicitly (that is, leaving the insight to be a success for the learning) that "OTOH you have to design features yourself [for the polyreg, because ...]". Although, since "Feature Engineering" is a buzzword for NN development, I still don't understand the difference. Indeed, the paper implies there is none except for the approach and terminology.

Essentially, they uphold that the traditional terminology should not be discarded in favour of the new, but rather understood in context. The part of understanding is left to the reader, of course.

Relatedly, in a linear algevra course, Terry Tao's one pixel camera was attributed with the success of the following ML resurgence, while we were otherwise talking about convolutions, fourier syntheses, wavelets and the like. It's no secret that linear algebra is a corner stone of ... just a corner stone, and that abstract algebra, topology and the like lies very near; At least this lecturer made it a main concern of the course to get that accross, which figured in nicely with a logics course on universal algebra that I took in parallel, while others taught rather monotonously towards taylor series and fourier transformations. At any rate, pretty much all professional researchers in the field stress that the mathematical basis need to be understood.

PS: The paper is written with an undergrad, what ever that means; No offence, I really don't know the slang, much less how much study vs research this implies. The publishing coauthor slash blog host shows some resentment against new fangled fancyfull terminology in the blog's About section, which might explain the scope of the paper, as well as the intended extent as far as I outlined my impressions above.

Your criticism is not quite correct, insofar the blog post notes the conclusions of the paper explicitly, which seems to be explaining common pitfalls, by use of statistical terms.

PPS: Many obervers lament that the results of NNs are intractable, nigh impossible to verify. This is a strong contrast to mathematical rigor. Hoping for the traditional methodology to get up to speed is in principle justified. I'm sure that your remark about design by hand being intractable holds as well, I'm just not sure to which extent. Showing that it can be done reasonably for some is a start, and chronicaling that endeavor is par for the course, however perhaps not yet enough, I guess, as another comment asks for benchmarks.