Hacker News new | ask | show | jobs
by rlupi 941 days ago
Could looking at NN through the lens of group theory unlock a lot of performance improvements?

If they have inner symmetries we are not aware of, you can avoid waste in searching in the wrong directions.

If you know that some concepts are necessarily independent, you can exploit that in your encoding to avoid superposition.

For example, I am using cyclic groups and dihedral groups, and prime powers to encode representations of what I know to be independent concepts in a NN for a small personal project.

I am working on a 32-bit (perhaps float) representation of mixtures of quantized Von Mises distributions (time of day patterns). I know there are enough bits to represent what I want, but I also want specific algebraic properties so that they will act as a probabilistic sketch: an accumulator or a Monad if you like.

I don't know the exact formula for this probabilistic sketch operator, but I am positive it should exist. (I am just starting to learn group theory and category theory, to solve this problem; I suspect I want a specific semi-lattice structure, but I haven't studied enough to know what properties I want)

My plan is to encode hourly buckets (location) as primes and how fuzzy they are (concentration) as their powers. I don't know if this will work completely, but it will be the starting point for my next experiment: try to learn the probabilistic sketch I want.

I suspect that I will need different activation functions that you'd normally use in NN, because linear or ReLU or similar won't be good to represent in finite space what I am searching for (likely a modular form or L-function). Looking at Koopman operator theory, I think I need to introduce non-linearity in the form of a Theta function neuron or Ramanujan Tau function (which is very connected to my problem).