Hacker News new | ask | show | jobs
by therealdrag0 1142 days ago
How is each dimension maintained to have a sticky meaning among scenarios?
3 comments

Because the model used to compute the embeddings is the same across scenarios. You can infer meaning for each dimension by checking which inputs get embeddings that have large values for the dimension.

If the inputs are images, you may find that some dimension scores e.g. how much blue there is in the image. Though often it's not that simple (there could be multiple dimensions that relate to how blue the image is, especially if the embedding dimensionality is large, which it does tend to be these days. Though you could reduce the embedding dimensionality first using PCA, and see what input images correspond to high/low values of the first principal component, etc.).

Dimensions itself do not carry any meaning, what matters are the neighbors to maintain a sense of similarity. Think if it like a very complex point cloud. Applying an n-dimensional rotation leads to the same point cloud content wise.

As for the number of dimensions, in a sense they are a training variable just as the content itself. The more dimensions you utilize for your embeddings the more complex your relations can be during clustering. Too many dimensions can easily lead to over fitting however and too little dimensions can usually not accurately represent the training corpus.

All the embeddings (vectors) are usually generated at the same time, and regenerated periodically. Does this answer your question?