Hacker News new | ask | show | jobs
by jhrmnn 1839 days ago
There are a few works that try to put deep learning on some theoretical basis, I like this one, for example:

https://arxiv.org/abs/1703.00810

This goes beyond mere intuition, but it is also still very far from a “complete theory”.

I find it disappointing that so few people in deep learning work on the theoretical foundations.

3 comments

What are some subfields of mathematics that you would say are crucial for gaining a proper understanding of all the things related to deep learning (e.g. let's say the paper you linked)? Even though the theory isn't complete, I'm sure a grounding in certain fields of mathematics will be helpful.
This is always difficult to answer, and it will probably be a mixture of many, however I am currently following categorical approaches to machine learning. Category Theory is the area of mathematics that studies composable structures, i.e. like layers in a deep network. It is very abstract and was invented to solve problem in algebraic geometry, but has been fruitful in other areas as well.
That you this illustrates that the situation today is "take whatever math-stuff you have, throw it against neural networks and see what you get". IE, I'm pretty sure not much progress has been made with category theory and neural networks - but you might be the first.

I've seen differential equations, Markov chains, differential geometry and other stuff. We might be in heady days before the "big breakthrough" is made. But these constructs might be inherently pathological (even then, non-pathological variants might be possible).

It’s good for paper publication, and for public sector. Take an obscure area of math and throw it against latest trend.

It’s a question if it’s useful.

It can be useful for innovation in the aggregate.
Could you give some favourite references, some use of category theory in ML which gives good results compared to standard approaches?

Is there a group doing this in Zurich?

Dynamical systems and chaos theory (especially for neural networks), information theory (especially for the paper linked), probability theory (especially the more foundational and axiomatic work)
You can start from this. https://arxiv.org/abs/1603.04929
Of the many "understanding neural networks" papers this is one of the few valuable ones.
Agreed. Until we get to the point where there are theorems of the form, for example, "Given a problem satisfying conditions X, the optimal number of layers to minimize expected training time for data satisfying Y is Z", it is just stamp collecting.