|
|
|
|
|
by plq
2671 days ago
|
|
The stochastic strategy is to 1. enumerate every possible tag combination 2. assign a probability to each one 3. choose the parse with highest probability. 1. can be done either deterministically or stochastically. 2. requires you to have a language model trained with either human-tagged or semi-human-tagged corpus 3. was just the Viterbi algorithm last time I looked. Implementing 1 and 2 are require broad domain knowledge in two very different domains (linguistics and machine learning respectively) So while nowadays sentence segmentation can be considered a solved problem, it's far from trivial to implement one that can compete with the state of the art against real-world data. There is also a nice body of deterministic (rule-based) literature that is practically ignored nowadays. |
|