|
|
|
|
|
by dignal
1 day ago
|
|
It's very interesting. Thank you for making it open-source. Would you like to explain the inner workings of ELD? For example, what is the model of language that it produces, how does it compare a new word to the model and how is the word scored for different languages. |
|
ELD works like a traditional language detector, storing n-grams and tuned scores. (So it does not use a modern neural network).
It cleans the input text and extracts words, gets n-grams/tokens, each n-gram hash is searched on a fast hashtable, which points to several score slots for x amount of languages. And we build the scores for each of the found languages.
Sounds simple, and it is, because the work is done when training the database, setting the score values.
The database looks something like {"ngram_1":{Lang_id_1:score, Lang_id_7: score, ...}}, {"ngram_2":{Lang_id_5:score}}
I hope this answers your question. I could go into more detail.
Also, if anybody finds this interesting you could "Vouch" this post, so it goes public as it is hidden since I am a new user.