Hacker News new | ask | show | jobs
by picometer 909 days ago
In hindsight, reviewer f5bf’s comment is fascinating:

> - It would be interesting if the authors could say something about how these models deal with intransitive semantic similarities, e.g., with the similarities between 'river', 'bank', and 'bailout'. People like Tversky have advocated against the use of semantic-space models like NLMs because they cannot appropriately model intransitive similarities.

What I’ve noticed in the latest models (GPT, image diffusion models, etc) is an ability to play with words when there’s a double meaning. This struck me as something that used to be very human, but is now in the toolbox of generative models. (Most of which, I assume, use something akin word2vec for deriving embedding vectors from prompts.)

Is the word2vec ambiguity contributing to the wordplay ability? I don’t know, but it points to a “feature vs bug” situation where such an ambiguity is a feature for creative purposes, but a bug if you want to model semantic space as a strict vector space.

My interpretation here is that the word/prompt embeddings in current models are so huge that they’re overloaded with redundant dimensions, such that it wouldn’t satisfy any mathematical formalism (eg of well-behaved vector spaces) at all.

2 comments

The key difference is what I'd call "context-free embeddings" vs "contextual embeddings". Due to its structure, word2vec and similar solutions have to assign every single "bank" in every sentence the exact same vector, but later models (e.g. all the transformer models, BERT, GPT, etc) will assign wildly different vectors to "bank" depending on the context of surrounding words for that particular mention of "bank".
Even small models (e.g. hidden dims = 32) should be able to handle token ambiguity with attention. The information is not so much in the token itself as in the context.