Hacker News new | ask | show | jobs
by kristianp 4803 days ago
So each language is stored with a prefix in the Judy array. This means that to identify the language of a token, you have to loop for all languages, prefixing the token and looking it up, keeping a count of matches for each language. Does that sound correct?

I wouldn't have used the prefix approach, instead storing a token once in the judy array, and using the data stored to indicate which languages match the token.