Hacker News new | ask | show | jobs
by dan-robertson 531 days ago
I was pretty confused by this for a while. I think the context I was missing is that this is about a function in nympy called ‘einsum’ which is somewhat related to Einstein summation notation.

To write a little more: there are two things people mean by ‘tensor’. One is a kind of geometric object that corresponds to a multilinear map between vector spaces (or modules, I suppose), and another is a array indexed by k-tuples (ie what would be called a multidimensional array in C or Fortran programming). For a given choice basis, one can represent a degree k tensor with a k-dimensional array of scalars.

Only certain operations make sense geometrically on tensors (in the sense that they do not depend on the choice of basis) and these can be broken down into:

- tensor products, which take degree n and m tensors and output a degree (n + m) tensor

- contractions which take a degree n tensor and output a degree (n-2) tensor

- generalized transpositions which take a degree n tensor and output a degree n tensor

A matrix multiplication can be seen as a composition of a tensor product and a contraction; a matrix trace is just a contraction.

The Einstein summation convention is a notation which succinctly expresses these geometric operations by describing what one does with the ‘grid of numbers’ representation, combined with the convention that, if an index is repeated in a term twice (an odd number bigger than 1 is meaningless, an even number is equivalent to reapplying the ‘twice’ rule many times) one should implicitly sum the expression for each basis vector for that index. You get: tensor products by juxtaposition, contractions by repeated indexes, and transpositions by reordering indexes.

In numpy, it is for general computation rather than expressing something geometric so one doesn’t need the restrictions on number of times an index occurs. Instead I guess the rule is something like:

- if index is only on lhs, sum over it

- if index on lhs and rhs then don’t sum

- if index only on rhs or repeated on rhs, error

And computationally I guess it’s something like (1) figure out output shape and (2):

  for output_index of output:
    p = 1
    for (input, input_indexes) of (inputs, lhs):
      p = p * input[input_indexes(output_index)]
    output[output_index] = p
3 comments

And one of the interesting (and hard) computer science problems is how to turn an arbitrary einsum expression into the most efficient series of operations (transpose, matmul, etc.) available in hardware kernels. This is mentioned in the post under "How Contractions Are Actually Computed".
I mostly agree.

> an even number is equivalent to reapplying the ‘twice’ rule many times

I never heard that extension and in my opinion is a bad idea. One nice property of the Einstein summation notation is that you can reorder the tensors(with their index), and the result does no change. If you allow 4x repetitions this is not valid).

---

Also, a nice property of tensors written in paper is that you can write the index as subscripsts or superscripts to remember how they change when there is a base change. You can only colapse a subscripst with a superscript, so the result of the operation does not depend on the choice of the base. (It would be very nice that Python can remember and check this too to avoid silly error.)

Ah I think I’m just wrong and the 4x thing shouldn’t be allowed
Academic CS, moreso than any of the other STEM subjects, is like that whose line is it anyway meme - all the words are completely made up and absolutely none of the results matter.

So a vector isn't a vector, a tensor isn't a tensor, einsum isn't actually Einstein summation, automatic differentiation often gives derivatives for things that aren't differentiable, a neural network has nothing to do with neurons, etc etc etc

CS people think it's a crime that humanities people make up words in order to publish but they're really living in an extremely glassy glass house.

Yes, I think it comes from the kind of metaphorical language that writing programs induces. If artists "kinda lie" when describing how their art represents some emotion or reality, likewise, computer scientists are also engaged in this 'metaphorical representation' game -- unlike the science, which aim to actually represent, not metaphorically.

This also drives me mad reading many a computer science paper now -- the disrespect for the ordinary meaning of words is only so obscene in the most radical kinds of poetry. This feels wholly out of place in anything called a 'science', and leads to endless amounts of confusion.

There are none in all of academia so brazen in a poetic use of language and at the same time so incurious as to its meaning.

Vec4 is a vector, but yes C++ base vec is not.

In some ways CS is more correct with the modern (abstract) algebra definition of a vector, vs the historical didactics arrow concept.

Tensor has been abused, it is a property of bilinear maps, and as matrix multiplication is a bilinear operation there is a concept overlap there.

Vectors are really just elements of a vector space, not necessarily arrows.