| HN Mirror

> That's a property of Word2Vec specifically due to how it's trained (a shallow network where most of the "logic" would be contained within the embeddings themselves).

Is it though? I thought the LLM-based embeddings are even more fun for this, as you have many more interesting directions to move in. I.e. not just:

emb("king") - emb("man") + emb("woman") = emb("queen")

But also e.g.:

emb(<insert a couple paragraph long positive book review>) + av(sad) + bv(short) - c*v(positive) = emb(<a single paragraph, negative and depressing review>)

Where a, b, c are some constants to tweak, and v(X) is a vector for quality X, which you can get by embedding a bunch of texts expressing the quality X and averaging them out (or doing some other dimensional reduction trickery).

I've suggested this on HN some time ago, but only been told that I'm confused and the idea is not even wrong. But then, there was this talk on some AI conference recently[0], where the speaker demonstrated exactly this kind of latent space translations of text in a language model.

[0] - https://www.youtube.com/watch?v=veShHxQYPzo&t=13980s - "The Hidden Life of Embeddings", by Linus Lee from Notion.