|
|
|
|
|
by jhanschoo
3090 days ago
|
|
On the slim chance that someone here wants to re-implement the unmodified Kneser-Ney algorithm[0], the presentation of it by the book does not account for unknown tokens in the query not in the vocabulary. I extended the recurrence to its natural closure including unknown tokens here [https://github.com/jhanschoo/HMMTagger/blob/master/readme.pd...]. A straightforward task, but it might take you an hour or two (probably more) otherwise to obtain it and prove its correctness, seeing as I couldn't find an extension in a Google search nor is it described in the original paper as well. I believe that it would likewise be straightforward to extend this to modified Kneser-Ney as well. [0]: The modification of using multiple discount values due to Chen & Goodman is regarded as a more well-behaved smoothing, and more popular today. |
|