Hacker News new | ask | show | jobs
by fromthestart 2476 days ago
I'm afraid you misunderstand the way embeddings work - at least for BERT based models, which are currently state of the art.

BERT embeddings, after training change with context. In other words if you feed a paragraph about bank robbers and look at the encoding for bank, it will be meaningfully different from the encoding for the same word produced from a paragraph (or sentence) about river banks.

We use BERT at the startup I work at, and one of our tests was the sentence "the bank robbers robbed the bank and then rested by the river bank". BERT was able to generate three different semantically meaningful encodings for the word bank in this sentence. The first two instances were much closer to each other in vector space (euclidean distance) than the last.

This is huge, because it is arguably the first step in building an AI which can perform basic reasoning about information encoded in text. For example, if you average up the encodings of a paragraph of words, you can create an "encoding" which assigns a summary meaning or topic. Simple vector math becomes a powerful reasoning tool.

The future is here.

2 comments

> This is huge, because it is arguably the first step in building an AI which can perform basic reasoning about information encoded in text.

Well, except for the many many decades of previous work on NLP using symbolic methods that are quite capable. Although DNNs are en vogue and have some amazing properties, we shouldn't forget that symbolic AI/NLP using explicitly semantic representations is powerful and has a rich history, and complements DNNs quite well -- such as being easily explainable, for one.

The contextualized word embeddings you get out of BERT are still generated from fixed per-word vectors. And while you get one output vector for each input vector, that doesn't mean they correspond to each other. The model could arbitrarily reshuffle information between outputs, so long as the output as a whole reflects the input sufficiently well. So BERT embeddings are not "word embeddings" in the usual sense.