|
|
|
|
|
by binarymax
2381 days ago
|
|
Word embeddings are unsupervised learning, so the features are not chosen, only the number of features. The model then learns the scalars for each feature as a single vector depending on the algorithm/architecure. When using CBOW, for instance, with a set window size N, the features learned for a single term are based on the order of the preceding N terms. This will result in similar vectors for terms appearing in the same context. It has its pros and cons though - a great example being “the reservation is confirmed” vs “the reservation is cancelled” - where confirmed/cancelled will have similar features. |
|