|
|
|
|
|
by Radim
1592 days ago
|
|
My experience is the opposite: character ngram models work "OK" on academic tasks and clean corpora. Not so much when unleashed on real data. By "real", I mean texts in a mix of multiple languages (super common on the web); short texts; texts in a different (unknown) language where ngrams don't know how to say "I don't know" and return rubbish instead; texts in close languages; etc. Going "deep learning" is not the only alternative. Even simpler methods can work significantly better, while being fully interpretable: https://link.springer.com/chapter/10.1007/978-3-642-00382-0_... |
|