| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andreasvc 4318 days ago
	BTW, here is my implementation of this idea: https://github.com/andreasvc/disco-dop/blob/master/web/parse... I haven't it tested on more than 3 languages so it might perform badly but I have the intuition that it is easier to get good coverage of the vocabulary of languages than to get the frequencies of something like the top character n-grams right. The latter is affected by authorship and genre of text &c.