Hacker News new | ask | show | jobs
by beforeolives 1912 days ago
This is true but I don't like that the math understanding is often treated like some magic secret that only a select few have access to. As long as you have a decent foundation you can research stuff and get deeper into the theory as necessary. And this inevitably happens when you switch problem domains even if you've gone really deep in a certain type of algorithms.

It's not too different from a software developer figuring out a new API/system/technique etc. but for some reason the attitude is that you either know it or you don't.

And in the context of this article, I think that the point is that even the deeper understanding isn't that valuable to many organisations.

2 comments

I've been realising this recently, while I'm a professional programmer I only ever really learned the maths I needed for my degree and even then most of that got forgotten after I graduated beyond what's necessary for my day-to-day work. I did a bit of ML at university and I've been meaning to pick it up again but wanted to avoid half-arsing it by just learning the libraries rather than the underlying mathematical principles as well. One of the mental hurdles has been getting over this idea of "ML maths" as this black box, I've started with some linear algebra courses and while it's very interesting in its own right, it's also showed me I have some pretty enormous gaps in my knowledge!

Next time I'm between jobs (hopefully won't be for a long time) I'm going to revisit maths as its own thing, I really want to get my calculus and trigonometry up to scratch as well as things like linear algebra and statistics. It's interesting how quickly it leaves your head too, I did pretty well at university with ML but having not exercised those muscles so much fell out the instant that exam timer hit zero.

> having not exercised those muscles so much fell out

I had the same experience - I learned enough to pass the tests and then forgot everything as soon as the semester ended. I picked it back up out of genuine interest years later and it was amazing how much I retained now that I was actually studying because I wanted to rather than because I had to. You might also be surprised how much you actually do remember, hidden just under the surface of your consciousness, if you do go back and try to remediate on your own.

I think it could be more valuable to find a problem you find interesting and see if there is a model in the literature that you can use / implement starting with the most general/available and specializing if necessary. Most of ML work is, as the article alludes to, collecting and managing data.
What math, anyway? Knowing what a Gaussian is? There's really surprisingly very few math in the whole field and people even refuse to put things in mathematical terms if at all possible.
Agreed. It's really more like playing with Lego. Take residual connections, for instance. The insight was that information wasn't traveling far enough when the networks got too deep. So they just.... Plugged the earlier layers into the later layers. And this has been a very important development.

Or things like batch norm. We don't know why it's important. People do math to try to explain what's going on, not so much to figure out where we should go next.

Related: Understanding is a poor substitute for convexity https://www.edge.org/conversation/nassim_nicholas_taleb-unde...

> things like batch norm. We don't know why it's important

It is pretty well understood.

Posts like this really piss me off, because you can make anything sound small. So don't take the next few paragraphs as me attacking you, I'm just venting a general sentiment I've had for a while. (Looking back after typing it out, I might actually flesh it out and post it on my blog, I hopeit's somewhat thought stimulating for others as well).

It's like someone saying "well, electrical engineering is really like playing with building blocks. Take zener diodes for example, the problem was that you can have a lot of power in a circuit, but if there's a power spike it might break. So they just...plugged in a piece that breaks by shunting the power spike into the ground, then reset. And this is now a major piece of electronics." Or "so, everyone is always going off about arabian mathematicians, but one "big development" they did was to invent the zero - just make up a symbol where previously you'd leave a space. It's basically just a change of notation!".

Deep learning theory (statistical, information theoretical and optimisation wise) is our process of understanding how to design systems that adapt themselves to feedback, and how to encode tasks in them. Batchnorm was inspired by one thing (internal covariate shift), and that thing was plausible, but as it turns out, in systems as complex (not complicated, complex as in interactions) as universal function approximators, adding one thing can radically change things. As it turns out, batchnorm smooths the function, it decouples parameter magnitude and directions and it positively improves signal propagation. How else would you have figured this out without having systems like neural networks with batchnorm already in place that you can study? And now there are lines of work emerging that do away with batchnorm, but have distilled the positive properties into smaller techniques (https://arxiv.org/pdf/2102.06171.pdf, Soham De gave a lecture at our lab recently).

Same thing with skip connections: Jürgen Schmidhuber will rightfully point out we've had highway networks since his heyday, but details matter. It is really not intuitive before you do it that in such a complex system, skip connections will be beneficial, because before we had them and started studying them on complex system, the ideas of thinking of them as learning small adjustments to a signal, or as an ensemble of shallow learners or the other perspectives that they have been studied under had not been developed.

And how would you? Without having them working really well, you'd have to start thinking about them from first principles in the giant design space of nonlinear, nonconvex functions, without being able to prove anything because we don't have the mathematical formalism yet.

Deep learning theory and nonconvex optimisation right now is a new physics born out of the marriage of information theory, computer science and computer engineering (and not surprisingly in a menage a trois, a lot of groundwork was laid by the french and other weird europeans /joke). We have a bunch of theory nerds trying to explain what we see in elegant and concise mathematical frameworks and trying to come up with testable predictions, and a bunch of experimentational people actually coming up with ways of testing it, gluing together the bits of understanding we have with soft knowledge to make the learning engine go brr and give feedback to the theorists on what held up, what didn't work predictable and what didn't go according to predictions. And people mouth off about the empirical nature of things.

Well, I ask: How else would you figure this stuff out? I think there is a cult of genius at play here, where if you don't start with category theory and platonic ideal conceptions of reality and derive your model without any experiment, you are somehow lesser.

Well, despite what people like to sell, disruption is a lie, everything is incremental, and without having the hackers make things work in clunky ways, the theorists would circle jerk themselves in creative dead ends because of a lack of stimulus.

And as always, there are a lot of people who make themselves sounds smarter by affecting superiority and disdain on this scientific process, while in the background nerds deepen our understanding of the universe.

Depends on which part of the field you're referring to. Deep learning is arguably the least theory-intensive and it still requires at least calculus and linear algebra to understand what's going on.