Hacker News new | ask | show | jobs
by PeterisP 2389 days ago
By default, there's no attempt to connect a particular semantic meaning with a particular dimension - it's worth noting that all the popular methods of calculating word vectors can/will give results that can differ by an arbitrary linear transformation, so in the event that they contain a very particular "semantic bit", it's still likely to be "smeared" across all 200 dimensions - you could have a linear (unambigious, reversible) transformation to a different 200-dimension space where that particular factor is isolated in a separate dimension, but you would have to explicitly try and do that.

So the default situation is that each individual dimension means "nothing and everything"; if you had some specific factors which you know beforehand and that you want to determine, then you could calculate a transformation to project all the vectors to a different vector-space where #1 means thing A, #2 means thing B, etc - for example, there some nice experiments with 'face' vectors that can separate out age/masculinity/hair length/happiness/etc out of the initial vectors coming out of some image analysis neural network with an unclear meaning of each separate dimension.

1 comments

Even without a priori semantic goals, I think you can also transform/rotate your dimensions to maximize their interpretability.

Simplified example: if your two-dimensional system gives you two points:

    -0.5, 0.5
    0.5, 0.5
Then you losslessly rotate to

    0, 1.0
    1.0, 0
With the idea that the latter is simpler for humans to assign semantics to
This is the idea behind non-negative matrix formulation (NMF). As the name implies, it forces the entries of the embedding matrices (for both the reduced document and term matrix) to be nonnegative, which results in a more interpretable “sum of parts” representation. You can really see the difference (compared to LSA/SVD/PCA, which does not have this constraint) when it’s applied to images of faces. Also, NMF has been shown to be equivalent to word2vec. The classic paper is here: http://www.cs.columbia.edu/~blei/fogm/2019F/readings/LeeSeun...

PS—There should be a negative sign on the (2,2) entry of the first matrix.

> non-negative matrix formulation (NMF)

*factorization ;)

Also PCA follows a similar idea as well (I mean, rotating vectors), but it's usually done is a much lower dimensional space

Ugh, that one was auto-correct, I swear. I have no idea what’s going on at Apple’s NLP department.
Is this the same intent that a 'variation autoencoder' would perform?

Also, is it possible in non-variational implementations (like this one) that some of the dimensions represent multiple groups? For example, not just 0.5 and -0.5 groups, but also a 0.0 group in the middle. Then your rotation wouldn't be sufficient, you would need to increase the dimensionality to cleanly separate the groups.