| HN Mirror

Thanks. I will try to answer.

ELD works like a traditional language detector, storing n-grams and tuned scores. (So it does not use a modern neural network).

It cleans the input text and extracts words, gets n-grams/tokens, each n-gram hash is searched on a fast hashtable, which points to several score slots for x amount of languages. And we build the scores for each of the found languages.

Sounds simple, and it is, because the work is done when training the database, setting the score values.

The database looks something like {"ngram_1":{Lang_id_1:score, Lang_id_7: score, ...}}, {"ngram_2":{Lang_id_5:score}}

I hope this answers your question. I could go into more detail.

Also, if anybody finds this interesting you could "Vouch" this post, so it goes public as it is hidden since I am a new user.