Hacker News new | ask | show | jobs
by yellow_lead 945 days ago
Because of embeddings
1 comments

Can you explain why without using the word "embeddings", in a way that convinces us that it's just autocomplete?

Or to put it another way: what is being embedded? Is it an abstract concept like question-answering?

I'm not an expert but the way I think of it generally is that words can be converted to vectors, i.e cat -> (1,8,23,34,32). They are much longer than this but just an example. For the corresponding word in Spanish/French/Other languages, the vector is actually quite similar. So when an LLM sees french, it can actually use the English training data to respond in french, because of this intermediary translation to vectors.
This is supposed to be an explanation of why LLMs are just autocomplete, but you're describing the ability to make connections between ideas across languages semantically as an explanation for why answering questions is learned cross-language, and talking about it like it's not mindblowing by describing how the lookups work in a vector space of ideas and concepts.