Hacker News new | ask | show | jobs
by _as_text 942 days ago
I just skimmed through it for now, but it has seemed kinda natural to me for a few months now that there would be a deep connection between neural networks and differential or algebraic geometry.

Each ReLU layer is just a (quasi-)linear transformation, and a pass through two layers is basically also a linear transformation. If you say you want some piece of information to stay (numerically) intact as it passes through the network, you say you want that piece of information to be processed in the same way in each layer. The groups of linear transformations that "all process information in the same way, and their compositions do, as well" are basically the Lie groups. Anyone else ever had this thought?

I imagine if nothing catastrophic happens we'll have a really beautiful theory of all this someday, which I won't create, but maybe I'll be able to understand it after a lot of hard work.

6 comments

You might be interested in this workshop: https://www.neurreps.org/

And a possibly relevant paper from it:

https://openreview.net/forum?id=Ag8HcNFfDsg

ReLU is quite far from linear, adding ReLU activations to a linear layer amounts to fitting a piecewise-segmented model of the underlying data.
Well, at all but a finite number of points (specifically all but one point), there is a neighborhood of that point at which ReLU matches a linear function...

In one sense, that seems rather close to being linear. If you take a random point (according to a continuous probability distribution) , then with probability 1, if look in a small enough neighborhood of the selected point, it will be indistinguishable from linear within that neighborhood.

And, for a network made of ReLU gates and affine maps, still get that it looks indistinguishable from affine on any small enough region around any point outside of a set of measure zero.

So... Depends what we mean by “almost linear” I think. I think one can make a reasonable case for saying that, in a sense it is “almost linear”.

But yes, of course I agree that in another important sense, it is far from linear. (E.g. it is not well approximated by any linear function)

Yeah, and we have more than measure zero -- the subsets of the input space on which a fully ReLU MLP is linear are Boolean combinations of hyperspaces. I was coming at it from the heuristic that if you can triangulate a space into a finite number of easily computable convex sets such that the inside of each one has some trait, then it's as good as saying that the space has this trait. But of course this heuristic doesn't always have to be true, or useful.
Everything is something. Question is what this nomenclature gymnastics buys you? Unless you answer that this is no different than claiming neural networks are a projection of my soul
Could looking at NN through the lens of group theory unlock a lot of performance improvements?

If they have inner symmetries we are not aware of, you can avoid waste in searching in the wrong directions.

If you know that some concepts are necessarily independent, you can exploit that in your encoding to avoid superposition.

For example, I am using cyclic groups and dihedral groups, and prime powers to encode representations of what I know to be independent concepts in a NN for a small personal project.

I am working on a 32-bit (perhaps float) representation of mixtures of quantized Von Mises distributions (time of day patterns). I know there are enough bits to represent what I want, but I also want specific algebraic properties so that they will act as a probabilistic sketch: an accumulator or a Monad if you like.

I don't know the exact formula for this probabilistic sketch operator, but I am positive it should exist. (I am just starting to learn group theory and category theory, to solve this problem; I suspect I want a specific semi-lattice structure, but I haven't studied enough to know what properties I want)

My plan is to encode hourly buckets (location) as primes and how fuzzy they are (concentration) as their powers. I don't know if this will work completely, but it will be the starting point for my next experiment: try to learn the probabilistic sketch I want.

I suspect that I will need different activation functions that you'd normally use in NN, because linear or ReLU or similar won't be good to represent in finite space what I am searching for (likely a modular form or L-function). Looking at Koopman operator theory, I think I need to introduce non-linearity in the form of a Theta function neuron or Ramanujan Tau function (which is very connected to my problem).

I would argue that there are a few fundamental ways to make progress in mathematics:

1. Proving that a thing or set of things is part of some grouping

2. Proving that a grouping has some property or set of properties (including connections to or relationships with other groupings)

These are extremely powerful tools and they buy you a lot because they allow you to connect new things in with mathematical work that has been done in the past. So for example if the GP surmises that something is a Lie group that buys them a bunch of results stretching back to the 18th century which can be applied to understand these neural nets even though they are a modern concept.

> what this nomenclature gymnastics buys you?

???

Are you writing off all abstract mathematics as nomenclature gymnastics, or is there something about this connection that you think makes it particularly useless?

I did a little spelunking some time ago reacting to the same urge. Tropical geometry appears to be where the math talk is at.

Just dropping the reference here, I don't grok the literature.

> deep connection between neural networks and differential or algebraic geometry

I disagree with how you came to this conclusion (because it ignores non-linearity of neural networks), but this is pretty true. Look up gauge invariant neural networks.

Bruna et al. Mathematics of deep learning course might also be interesting to you.

What? The very point of neural networks is representing non-linear functions.