Hacker News new | ask | show | jobs
by kylebgorman 3938 days ago
I just can't agree that a simple, linear-time operation like "tokenizing words and basic n-gram models out of them" is a tedious problem like you seem to be implying, nor do I feel a solution to this very-solved problem is "compelling". Word tokenization and n-gram models are simple, unreasonably effective, and very fast. If character-based RNNs do better (albeit far more slowly during training), great, but nothing to see here, let's move along.

As I've posted here before, people have been training character n-gram models and getting language modeling performances comparable to those from word-based models---without using neural networks---for at least a decade. That it works with RNNs is no surprise because it worked just fine with the much more constrained predecessor technology.

1 comments

My problem isn't that the feature engineering is expensive or tedious, it's that it's privileging a lot of information that NNs learn from the data. Yeah ok, Markov models (n-grams) are simple and fast and produce good results for generating representative text.

Deep RNNs are simple and produce good results for a huge, diverse range of problems with no new domain information. As Andrej Karpathy wrote:

> Sometimes the ratio of how simple your model is to the quality of the results you get out of it blows past your expectations, and this was one of those times.

N-grams don't have nearly the power (eg longer-than-N-range structure like grammar) and don't generalize nearly as well, making them a lot less surprising.