|
|
|
|
|
by WorldMaker
44 days ago
|
|
Embeddings are still mostly just vectors into n-dimensional K-means clusters. It isn't "knowing" two things are related and here's the evidence, it is guessing two things are statistically likely to be related, based on trained patterns, and running with it without evidence. It has no "semantic understanding" as we would define it. It's just increasingly good at winning cluster lotteries because we've increased the amount of training data to incredible heights. |
|
Grouping vectors in concept space is exactly how you create semantic understanding. The proof is in how good they are at creating semantically valid text. The fact that it took massive amounts of data is irrelevant. That just shows how much knowledge is encoded in all our language. It takes humans a ton of training to know things too.