Hacker News new | ask | show | jobs
by intelkishan 243 days ago
I have seen this particular work example to work. You don't get the exact match but the closest one is indeed Queen.
2 comments

Yes but it doesn't generalize very well. Even on simple features like gender. If you go look at embeddings you'll find that man and woman are neighbors, just as king and queen are[0]. This is a better explanation for the result as you're just taking very small steps in the latent space.

Here, play around[1]

  mother - parent + man = woman
  father - parent + woman = man
  father - parent + man = woman
  mother - parent + woman = man
  woman - human + man = girl
Or some that should be trivial

  woman - man + man = girl
  man - man + man = woman
  woman - woman + woman = man
  
Working in very high dimensions is funky stuff. Embedding high dimensions into low dimensions results in even funkier stuff

[0] https://projector.tensorflow.org/

[1] https://www.cs.cmu.edu/~dst/WordEmbeddingDemo/

Thank you for the comment!

This led me to do a bit more research, and I see indeed the queen result is in itself infact "cheating" a bit: https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...

#TheMoreYouKnow

so addition is not associative?
I think you're missing the point
It's a pretty exotic type of addition that would lead to the second set of examples, just trying to get an idea of its nature.
Calling it addition is hairy here. Do you just mean an operator? If so, I'm with you. But normally people are expecting addition to have the full abelian group properties, which this certainly doesn't. It's not a ring because it doesn't have the multiplication structure. But it also isn't even a monoid[0] since, as we just discussed, it doesn't have associativity nor unitality.

There is far less structure here than you are assuming, and that's the underlying problem. There is local structure and so the addition operation will work as expected when operating on close neighbors, but this does greatly limit the utility.

And if you aren't aware of the terms I'm using here I think you should be extra careful. It highlights that you are making assumptions that you weren't aware were even assumptions (an unknown unknown just became a known unknown). I understand that this is an easy mistake to make since most people are not familiar with these concepts (including many in the ML world), but this is also why you need to be careful. Because even those that do are probably not going to drop these terms when discussing with anyone except other experts as there's no expectation that others will understand them.

[0] https://ncatlab.org/nlab/show/monoid

I think you misinterpreted the tone of my original comment as some sort of gotcha. Presumably you're overloading the addition symbol with some other operational meaning in the context of vector embeddings. I'm just calling it addition because you're using a plus sign and I don't know what else to call it, I wasn't referring to addition as it's commonly understood which is clearly associative.
Shouldn't this itself be a part of training?

Having set of "king - male + female = queen" like relations, including more complex phrases to align embeddings.

It seems like terse, lightweight, information dense way to address essence of knowldge.