Hacker News new | ask | show | jobs
by sriku 3116 days ago
> You become so acutely aware of the limitations of what you’re doing that the interest just gets beaten out of you. You would never go and say, “Oh yeah, I know the secret to building human-level AI.”

A colleague of mine called these "educated incapacities" - where we become acutely aware of impossibilities and lose sight of possibilities. Andrej Karpathy, in one of his interviews iirc, said something like "if you ask folks in nonlinear optimization, they'll tell you that DL is not possible".

It is useful to keep that innocence alive despite being educated, especially if the cost to trying something out doesn't involve radical health risks. That plus a balance with scholarship.

Knowledge, courage and the means to execute are all needed.

3 comments

Right. I found that part of the article particularly irritating - there are tons of examples of researchers making substantial contributions outside of their primary field, cf https://mathoverflow.net/q/173268/6360
> If you ask folks in nonlinear optimization, they'll tell you that DL is not possible.

I sincerely doubt anyone who knows more than one sentence about deep learning would say that, since deep learning doesn't claim to optimize.

i suspect that what he's referring to is that he's heuristically minimizing a somewhat arbitrary (loss) function in a million-ish dimensions using the simple variants of gradient descent that work under these conditions. it sounds far too WIBNI to produce good results reliably (in practice, let alone in theory). the landscape has so many stationary points at which to get stuck; why would you ever get good results?

there's a small cottage industry of papers (like [0]) that try to explain this.

[0] https://arxiv.org/pdf/1412.0233.pdf

I think this recent paper [1] sheds quite a bit of light on this.

[1] https://arxiv.org/abs/1703.00810v3

Really don't think that's the best paper to say "sheds quite a bit of light on this". That paper has been somewhat controversial since it came out.

I think https://arxiv.org/abs/1609.04836 is seminal in showing unsharp minima = generalization, the parent's paper is good for showing that gradient descent over non-convex surfaces works fine, https://arxiv.org/abs/1611.03530 is landmark for kicking off this whole generalization business (mainly shows that traditional models of generalization, namely VC dimension and ideas of "capacity" don't make sense for neural nets).

You are right. Unfortunately, many (doubly unfortunately, even in academia, well, many who switched careers in optimization to ML) think that machine learning is just optimization.

Regarding deep NNs, one should be careful with what one wishes for, because sometimes they come true. Landing up with the global optimum of that thing would likely be the last thing one wants.

The key to deep NNs is to do such a pathetic job of optimizing the loss that the generalization is good. A problem is that there several different ways of doing a job poorly, not all of them would generalize well. When I have my engineer hat on, I would rather not have lots of indeterminism on my watch if I can afford it. Too dang hard to maintain correctness of.

On the other hand if one has a "with high probability" style result where the probabilities are high enough to be practically relevant, then we have something more workable.

I don't understand why you don't want a global optimum. Is this obvious? Does the following paragraph explain it, because I don't see the connection.
It happens when practitioners generalize theorems to scenarios that look similar but don't apply. The common pattern is misapplying an infinite set theorem to finite set case. If you don't know about the theorem in question to begin with, there is no way for you to misrepresent it.
I think there is also a pretty pervasive over-estimation of how capable humans are.

As I see more of the failure modes of deep learning, a lot of successes and mistakes made by humans start to become more understandable. Machines don't need to be perfect or avoid failures; like humans, they need to work most of the time and then be used in systems that are tolerant of their potential faults and mistakes.