Hacker News new | ask | show | jobs
by VHRanger 827 days ago
Fair enough.

I think people skip over that the vectors are the result of the minimization of the objective.

That objective is roughly the same since word2vec. GLoVe is mathematically equivalent. LLMs are also equivalent.

For a LM, the objective function is still roughly the same. Maximizing probability of the next token conditional on previous tokens.

This means the embedding vector of a token minimizes distance to tokens that come before it often, and maximizes distance to those that don't.