|
|
|
|
|
by dclowd9901
4480 days ago
|
|
I think if I was writing a language detector, it would have these features: - learning heuristics based on user suggestion. - extension filtering to differentiate similar languages. - the algo would use prominence and placement of white space and non-word characters to create the DNA of a language. If the language scores below a threshold against the DNA, it doesn't presume, it asks the user. If a language scores high against this DNA, it still allows used override. Whenever a user would submit their indicator, its file source would be used to train the heuristic. |
|