|
|
|
|
|
by thaumaturgy
1974 days ago
|
|
> For instance, sometimes stemming words is the right way to go, but if you are say doing a medical system where two very different medicines could be spelled similar and stem to the same word... FWIW lemmatization may be a good alternative to stemming. Stemming is algorithmic and can generate errors, as you point out; "caring" for example might naively be stemmed to "car". Lemmatization uses a dictionary of senses and their root words to avoid this. For common English, there's Princeton's Wordnet (https://wordnet.princeton.edu/). Lemmatizing technical niches, like medicine, would require an additional dictionary. |
|