Hacker News new | ask | show | jobs
by thomasahle 526 days ago
I've found that thinking of tensors in terms of graphs make Einsums much more natural.

For example, a matrix product MN, `a b, b c -> a c` is just two nodes with two edges each: `-a- M -b- N -c-`. Their `b` edges are connected, so the resulting graph has only two "free" edges `a` and `c`. That's how we know the result is another matrix.

Once you look at tensors this way, a number of things that are normally tricky with standard matrix notation become trivial. Such a higher order derivatives used in neural networks.

I wrote https://tensorcookbook.com/ to give a simple reference for all of this.

4 comments

A few years ago I wrote a note (never published) on how many products can be seen in this way https://arxiv.org/abs/1903.01366
This is beautiful! I see we even decided on the same "vectorization tensor" and notation, which makes Kathri-Rao, and all the other "matrix products" much more intuitive!
I had been working on a library for visualizing tensor contractions (so-called Penrose tensor diagrams), https://github.com/Quantum-Flytrap/tensor-diagrams, with an example (a tutorial stub) https://p.migdal.pl/art-tensor-diagrams/.

My background is in quantum physics, and it is a useful tool for understanding some aspects of entanglement. Here, the idea was to show its use in machine learning in general and deep learning in particular.

Awesome! Are you able to handle sums of diagrams like (-M- + -a b-)-c etc?
Thanks! Right now, it does not.

I focus on a common case of a single product without any other parts. If there is a summation, it is managed via indices.

If you are interested in arbitrary drawings, you are probably aware of TikZ (if you like coding) or Excalidraw.

I use a similar notation, but never quite found a satisfactory notation for elementwise operations (e.g. `-M-a + -b`, especially broadcasted ones which I end up doing as `-A-B- + -c 1-`) or for denoting what derivatives are with respect to. Using differentials gets around some of the latter, but still, I was never quite satisfied. Any chance you've found nice options there?
The broadcasting using "1-" (or the order-1 copy/kronecker tensor) is actually pretty nice, I think. It allows a lot of nice manipulations, and make the low rank of the matrices really clear etc.

Addition does feel a bit hacky, but it may just be necessary for any associative notation. At least it means that rules such as distribution works the same as with classical notation.

I also wrote this tensorgrad library to do symbolic calculations using tensor networks: https://github.com/thomasahle/tensorgrad it keeps track of what you are taking derivatives with respect to. But I agree it would be nice to show more explicitly.

I just opened your book, very nice! I really like the derivatives. You went above and beyond with latex diagrams ^^
At this point I feel like I need to write a tikz library just for tensor diagrams...
There are some libraries, but the style is so not standardized that I understand your desire to write your own.

Back in 2020 I wanted to write a library based on manim to animate the contractions, but I never came around to it (and manim back then was less well-documented than it is now).

> There are some libraries, but the style is so not standardized that I understand your desire to write your own.

Do you have some recommendations? I'm currently using tikz' graphdrawing library because it supports subgraphs. This is necessary for handling addition, functions and derivatives. However, the force-based layouting doesn't work with subgraphs, which causes a lot of trouble.

> Back in 2020 I wanted to write a library based on manim to animate the contractions, but I never came around to it (and manim back then was less well-documented than it is now).

Manim is on the TODO list :-) https://github.com/thomasahle/tensorgrad/issues/7 If you feel like writing some code, I think it could work very well with tensorgrad's step-by-step derivations.

I don't have specific recommendations, I mostly use libraries to draw quantum computing circuits, but they're not great for tensor networks even though they share some features. Man, we think alike. Btw, I've seen you work with Patrick, we used to share an office at the IQC (we were both postdocs there), tell him I said hi! :)