|
|
|
|
|
by brockf
2353 days ago
|
|
Most implementations are actually moving in the opposite direction. Previously, there was a tendency to look to aggregate words into phrases to better capture the "context" of a word. Now, most approaches are splitting words into sub-word parts or even characters. With networks that capture temporal relationships across tokens (as opposed to older, "bag of words" models), multi-word patterns can effectively be captured by attending to the temporal order of sub-word parts. |
|
Indeed. Do you have an example of a library or snippet that demonstrates this?
My limited understanding of BERT (and other) word embeddings was that they only contain the word's position in the 728 (I believe) dimensional space but doesn't contain queryable temporal information no?
I like ngrams as a sort of untagged / unlabelled entity.