Hacker News new | ask | show | jobs
by indubitably 4273 days ago
Testing a statistical language identifier with texts this short is absurd. If you type in four or five words from

https://en.wikipedia.org/wiki/List_of_English_words_of_Frenc...

…do you expect it to return French or English?

1 comments

It is not absurd. Generally, if humans can do it, it is a reasonable task for NLP to attempt.

Yes you can present edge cases where there is no definite answer, like the one you cite, but this doesn't mean that the task in general is impossible or useless.

I agree the task is neither impossible nor useless. There’s work to do. Short passages should be supported. I do however think franc does a good job, and adds support for some languages which before today have never (I think) been supported. Franc, certainly, “attempt”s to fix language detection, which I would argue is an AI-complete problem.