Hacker News new | ask | show | jobs
by TeMPOraL 751 days ago
Yes, I'm making the weaker claim that concepts would generally sort themselves into roughly equivalent structures, that could be mapped to each other through some easy affine transformations (rotation, symmetry, translation, etc.) applied to various parts of the structures.

Or, in other words, I think absolute coordinates of any concept in the latent space are irrelevant and it makes no sense to compare them between two models; what matters is the relative position of concepts with respect to other concepts, and I expect the structures to be similar here for large enough datasets of real text, even if those data sets are disjoint.

(More specific prediction: take a typical LLM dataset, say Books3 or Common Crawl, randomly select half of it as dataset A, the remainder is dataset B. I expect that two models of the same architecture, one trained on dataset A, other on dataset B, should end up with structurally similar latent spaces.)

> Something that really sold me when I was in a similar mindset was word2vec's king - man + woman = queen wasn't actually real or in the model. Just a way of explaining it simply.

Huh, it seems I took the opposite understanding from word2vec: I expect that "king - man + woman = queen" should hold in most models. What I mean by structural similarity could be described as such equations mostly holding across models for a significant number of concepts.

1 comments

What would be an appropriate test?

- Given 2 word embedding sets,

- For each pair (A,B) of embeddings in one set,

- There exists an equivalence (A’,B’) in the other set,

- Such that dist(A,B) ≈ dist(A’, B’),

Something like that, to start. But would need to look at longer chains of relations.