|
|
|
|
|
by kirillkh
3153 days ago
|
|
How can this be used for full-text search, e.g. with Lucene? The first step in indexing a document for full-text search is reducing each word to its base form, and similarly for a search string. While it's not a difficult problem in English, in some languages (e.g. Herew) it's notoriously hard to figure out the base form of a word and further disambiguate its meaning, as the only way to do so is based on context. So how can you easily build a stemmer/lemmatizer on top of these instruments to perform such task? |
|
You can also use Brown clustering [3] to create the clusters. It does a good job and is faster to compute than clustered word vectors. However clustered word vectors typically have better semantic performance.
1. https://www.slideshare.net/mobile/lucidworks/implementing-co...
2. Demo source: https://github.com/DiceTechJobs/ConceptualSearch
3. https://github.com/percyliang/brown-cluster