Nice the author created this based on tatoeba.org data, I used to be the main developer and for tatoeba I created a language detector (because it's was painful for people to have to input a sentence AND the language, especially for polyglots), so it's more likely the language data used for this language detector was made itself by a language detector, funny when you think about it :)
Really fascinating from a linguistics perspective, I'm curious as to how this works and if it is possible to abstract away to help with the cataloguing of dying languages.
I suspect it's due to mixed-language content on Wikipedia. A lot of Wikipedia articles talk about foreign language art and culture, this is one of the largest (if not the largest) single categories of content on non-English Wikipedias.
https://github.com/allan-simon/Tatodetect (I should rewrite it in Rust some days) , it's a simple N-gram detector.