|
|
|
|
|
by mcyc
535 days ago
|
|
NB: Can't edit my original reply. Sorry actually I misread part of your comment in relation to the paper and confused δ and another parameter, K. To clarify, δ is the number of tokens in the tokenized corpus and K is the size of the vocabulary. So, if you are asking about why would they limit _K_, then my answer still applies (after swapping δ for K). But if you still mean "why do they pick some arbitrary δ as the limit of the size of the tokenized corpus", then I think the answer is just "because that makes it a decision problem". |
|