Hacker News new | ask | show | jobs
by dclowd9901 4480 days ago
I think if I was writing a language detector, it would have these features:

- learning heuristics based on user suggestion.

- extension filtering to differentiate similar languages.

- the algo would use prominence and placement of white space and non-word characters to create the DNA of a language. If the language scores below a threshold against the DNA, it doesn't presume, it asks the user. If a language scores high against this DNA, it still allows used override. Whenever a user would submit their indicator, its file source would be used to train the heuristic.

1 comments

This is because you likely think before you code.