I see https://github.com/google/cld3, but how does this compare with https://github.com/CLD2Owners/cld2 which is used by the large https://commoncrawl.org project to classify billions of samples from the whole internet?
https://github.com/pemistahl/lingua-py#4-how-good-is-it
CDL 2 seems to be slightly less accurate than CLD 3 on average.
https://github.com/pemistahl/lingua-py#4-how-good-is-it
CDL 2 seems to be slightly less accurate than CLD 3 on average.