Hacker News new | ask | show | jobs
by keithalewis 504 days ago
This is incomplete, incorrect, and irrelevant. Standard notation already exists. I'm sure it is fun to draw squiggly lines and some people enjoy reinventing the wheel. Spend some time learning what others have taught us before striking out on your own lonely path.
2 comments

This is standard notation that's been used for decades.

https://arxiv.org/abs/2402.01790v1

This paper motivates and explains concepts much better than the Tensor Cookbook.
I'm hoping the Tensor Cookbook can become as engaging to read for others as Jordan Taylor's paper was to me. If you have any thoughts on where I lose people, please share!
The cookbook is a work in progress by the looks of it.
"This book aims to standardize the notation for tensor diagrams..." https://youtu.be/zELbzXAmcUA?t=73
Tensor diagrams are standard, but some notation is missing. My goal was to be able to handle the entire Matrix Cookbook.

For this I needed a good notation for functions applied to specific dimensions and broadcasting over the rest. Like softmax in a transformer.

The function chapter is still under development in the book though. So if you have any good references for how it's been done graphically in the past, that I might have missed, feel free to share them.

You can do broadcasting with a tensor, at least for products and sums. The product is multilinear, and a sum can be in two steps, first step using a tensor to implement fanout. Though I can see the value in representing structure that can be used more efficiently versus just another box for a tensor. Beyond that (softmax?) seems kind of awkward since you're outside the domain of your "domain specific language". I don't know why it's needed to extend the matrix cookbook to tensor diagrams.
I come back to this every few months and do some work trying to make sense of how tensors are used in machine learning. Tensors, as used in physics and whose notation these tools inherit, are there for coordinate transforms and nothing else.

Tensors, as used in ML, are much closer to a key-value store with composite keys and scalar values, with most of the complexity coming from deciding how to filter on those composite keys.

Drop me a line if you're interested in a chat. This is something I've been thinking about for years now.

Highly recommend this note by Jordan Taylor.
Do point us at this standard notation
The author also seems to be unaware of Fréchet derivatives.
I don't exactly know what you mean but from your hint I found the uh, clarifying bedtime story:

https://arxiv.org/abs/2302.09687

(On functions of 3rd-order "tensors")

((Whereas matrix-functions are of 2nd-order "tensors"))

Playground: https://gitlab.com/katlund/t-frechet

(MATLAB)

The Wikipedia page on this is sufficient. If F:X -> Y is a function between normed linear spaces then DF:X -> L(X,Y), where L(X,Y) is the vector space of linear operators from X to Y, satisfies F(x + h) = F(x) + DF(x)h + o(h). A function is differentiable if it can be locally approximated by a linear operator.

Some confusion arises from the difference between f:R -> R and f':R -> R. It's Fréchet derivative is Df:R -> L(R,R) where Df(x)h = f'(x)h. Row vectors and column vectors a just a clumsy way of thinking about this.

BTW, all you need in order to publish on arixv.org is to know a FoF. There is no rigorous peer review. https://arxiv.org/abs/1912.01091, https://arxiv.org/abs/2009.10852.

What content about Fréchet derivatives do you think would be useful to include?