|
|
|
|
|
by nl
2138 days ago
|
|
This is true, but I don't think you are using word embedding like most people use them. The linear relationship between things like king/queen etc is a cute demo but not really useful or used in practice. The real usefulness of word embeddings is that similar concepts are close to each other so they make a great representation for other models (vs something like TF-IDF). These days they have been mostly surpassed in terms of state of the art by full language models, but the point is that simple techniques like average embedding of words in sentences generalised really well to unseen data. And if you add in subword embeddings they generalise to unseen words, too. We could talk about how context lets language models do this even better, but I'm still back trying to persuade the OP that this isn't just memorisation and good ML models work well on unseen data! |
|
If you come from the tfidf direction you can first tune up BM25 or something based on the ks-divergence, then you can use a random matrix, LDA, or the deep-network autoencoder that I worked on that crushed conventional tfidf vectors to 50-d vectors.
(Like many things people want to apply word vectors to, you go from 50% accuracy here to 70%, but we know it because we tested it on TREC gov2)
Today I'm interested in systems that have an input-to-action orientation and there you have to be able to put together a story like: "these 10 messages are parsed correctly and not by accident" and that requires that certain 'king/queen' inferences be done correctly or alternately the system has paths to recover from missing an inference.
Often there is no path to go from "popular models in the new A.I." to "something that can serve customers off the leash" and that's the problem.
Now I do like subword embeddings, but that just points out the problem that there is no such thing as a "word".
Let me justify that.
You can split up English into words like "some text".split() but it is not easy to do it from audio. Speech is punctuated by silences, often in the middle of words whenever you make a "[st]op" sound enough that separating words is equivalent to the whole speech understanding problem.
We can turn words into subwords and mash them together with subwords to make words. (e.g. "Fourthmeal", "Juneteenth", "Nihilego")
Also there are many cases you can replace a phrase with a word or a word with a phrase. Putting 'word' at the center of a model means the system is going to be in trouble w/ linguistic phenomena that happen 30% of the time.