|
|
|
|
|
by microtonal
4592 days ago
|
|
It's easy to write a language guesser, but's not easy to write a good one. Obviously, it is highly domain and text length dependent (as I also mentioned in another comment). But, e.g. Cavnar and Trenkle obtained a 99.8% accuracy on newsgroup articles in 14 languages using the method outlined above. There are very few NLP tasks where you can achieve such high accuracy with relatively simple and understandable methods. That's why it is a nice subject for an NLP introduction to e.g. high school students. I have worked in parsing and generation, where it is difficult to obtain satisfying results with many man years of work on newspaper text, let alone tweets or Youtube comments ;). |
|