|
|
|
|
|
by andreasvc
4271 days ago
|
|
BTW, here is my implementation of this idea: https://github.com/andreasvc/disco-dop/blob/master/web/parse... I haven't it tested on more than 3 languages so it might perform badly but I have the intuition that it is easier to get good coverage of the vocabulary of languages than to get the frequencies of something like the top character n-grams right. The latter is affected by authorship and genre of text &c. |
|