Hacker News new | ask | show | jobs
by ninjin 3709 days ago
> Apparently it was a surprise to the AI NLP teams that spent years doing manual classification, suddenly a Deep NN out performed them without any prior knowledge. Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

Hogwash! While there is certainly some truth to what you say and how "Deep Learning" has become mainstream in NLP over the last two years, it is far from as easy as you portray it to be.

The key paradigm shift has been in the downplay (not removal, mind you) of hand-crafted features and moving away from imposing constraints on your model. State-of-the-art NLP research, in general, no longer tends to spend time coming up with new indicator features, coming up with clever constraints, or finding ways of training models that require approximation techniques to even be feasible computationally. Instead, models tend to learn in an end-to-end fashion, where manipulating the model structure is significantly easier and we now learn features as opposed to specify them by hand. This is great and something I am happy to be a part of, but, if you want state-of-the-art results it is still fairly common to mix in some "old-school" features as well, just to squeeze that very last bit of performance out of your model.

It is also not fair to say "without any prior knowledge". Even if you train a parser in the new paradigm (like Vinyals et al. (2014)), you still need to supply your model with training data describing syntactic structure, this data was largely constructed by linguists in the 90s. The same thing goes for pretty much any NLP task beyond simple lexical semantics. We also knew that distributional features were useful even before the "Deep Learning" revolution, see Turian et al. (2010) for example, where the "Deep Learning" methods of that time were defeated by an "old-school" co-occurrence clustering method from the early 90s. Heck, the whole idea of distributional semantics was alive and well throughout the early 2000s and can trace its roots back to work such as Harris (1954) and arguably even the later Wittgenstein.

Note that I am saying all of this as a "Deep Learner" that has been pushing this agenda for about four years now, and I will continue to work along these lines since I think that "Deep Learning" (or rather Representation Learning) is currently the best approach for semantics in NLP. But hype is dangerous, even if it in many ways supports my cause.

1 comments

Thank you for the input. Yes I was being a bit flippant and shallow, ewll more conversational really.

You're right about hype being dangerous.