| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SixSigma 3709 days ago

I've watched all 8 available videos, which is as far as my knowledge goes but it has been background on gradients, calculating derivatives, introduction to word vectors and how they relate to each other, recurrent neural nets and how to push time series through, introduction to tensor flow and finally how to scan backwards and forwards through "time" in a recurrent RNN (each word in a sentence is a time step in NLP).

Word vectors are "just" high dimensional entities - 100-300 dimensions, used as input. So the introduction to them was about how you go about building a dataset that is a collection of 50,000 column vectors each of which is 300 rows. And then how to use that to go on and build a neural net to do useful work.

The conclusion is that all the work done on syntax, grammar and word classification can effectively be replaced by having a huge corpus (e.g. all of wikipedia is small), 300 dimensions for each word and then a loss function to classify each word.

One can imagine how that would be applied to sales data of multiple products or other data.

It foes on to suggested how sentiment analysis is performed and how entity recognition would work (entities being places, names of people and companies).

The info has been general but described in terms of NLP, the techniques so far are not just for use in NLP.

I'm not an NLP person and tbh I've never even made a neural net (although I could if I had a reason) I'm just interested in the subject.

1 comments

21 3709 days ago

> The conclusion is that all the work done on syntax, grammar and word classification can effectively be replaced by having a huge corpus

Is that a surprise? You don't teach a child how to speak by telling him about verbs and grammar. He will learn how to use them without having any formal idea about what they are.

link

SixSigma 3709 days ago

Apparently it was a surprise to the AI NLP teams that spent years doing manual classification, suddenly a Deep NN out performed them without any prior knowledge. Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

link

microtonal 3708 days ago

Apparently it was a surprise to the AI NLP teams [...]

Similar techniques were well known and used for years in NLP. E.g. Brown clustering has been used since the early nineties and have been shown to improve certain NLP tasks by quite an amount. NMF also been used for quite some time to obtain distributed representations of words. Also, many of the techniques used in NLP now (word embeddings, deep nets) have been known for quite a while. However, the lack of training data and computational power has prevented these techniques from taking off earlier.

Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

The 'rules of language' don't just fall out of word vectors. They fall out of embeddings combined with certain network topologies and supervised training. In my experience (working on dependency parsing), you also typically get better results by encoding language-specific knowledge. E.g. if your language is morphologically rich or does a lot of compounding, the coverage of word vectors is going to be pretty bad (compared to e.g. English). You will have to think about morphology and compounds as well. One of our papers that was recently accepted at ACL describes a substantial improvement in parsing German when incorporating/learning explicit information about clausal structure (topological fields).

Being able to train extremely good classifiers with a large amount of automatic feature formation does not mean that all the insights that were previously gained in linguistics or computational linguistics is suddenly worthless.

(Nonetheless, it's an exciting time to be in NLP.)

link

SixSigma 3708 days ago

I was rather over simplifying a tad and being conversational (and I'm not an expert, not even much beyond beginner).

It is indeed an exciting time.

link

ninjin 3709 days ago

> Apparently it was a surprise to the AI NLP teams that spent years doing manual classification, suddenly a Deep NN out performed them without any prior knowledge. Just make a 300 dimension vector of the occurrence frequencies of word combinations and out fall the rules of language!

Hogwash! While there is certainly some truth to what you say and how "Deep Learning" has become mainstream in NLP over the last two years, it is far from as easy as you portray it to be.

The key paradigm shift has been in the downplay (not removal, mind you) of hand-crafted features and moving away from imposing constraints on your model. State-of-the-art NLP research, in general, no longer tends to spend time coming up with new indicator features, coming up with clever constraints, or finding ways of training models that require approximation techniques to even be feasible computationally. Instead, models tend to learn in an end-to-end fashion, where manipulating the model structure is significantly easier and we now learn features as opposed to specify them by hand. This is great and something I am happy to be a part of, but, if you want state-of-the-art results it is still fairly common to mix in some "old-school" features as well, just to squeeze that very last bit of performance out of your model.

It is also not fair to say "without any prior knowledge". Even if you train a parser in the new paradigm (like Vinyals et al. (2014)), you still need to supply your model with training data describing syntactic structure, this data was largely constructed by linguists in the 90s. The same thing goes for pretty much any NLP task beyond simple lexical semantics. We also knew that distributional features were useful even before the "Deep Learning" revolution, see Turian et al. (2010) for example, where the "Deep Learning" methods of that time were defeated by an "old-school" co-occurrence clustering method from the early 90s. Heck, the whole idea of distributional semantics was alive and well throughout the early 2000s and can trace its roots back to work such as Harris (1954) and arguably even the later Wittgenstein.

Note that I am saying all of this as a "Deep Learner" that has been pushing this agenda for about four years now, and I will continue to work along these lines since I think that "Deep Learning" (or rather Representation Learning) is currently the best approach for semantics in NLP. But hype is dangerous, even if it in many ways supports my cause.

link

SixSigma 3708 days ago

Thank you for the input. Yes I was being a bit flippant and shallow, ewll more conversational really.

You're right about hype being dangerous.

link

SixSigma 3709 days ago

An interview with a pioneer in the field

https://medium.com/@dbeyer123/machines-that-dream-an-intervi...

link

tripzilch 3706 days ago

A child learns much more and more deeply about language from just a fraction of the amount of unsupervised data. The point is that the mechanisms are entirely different, it's not very useful to compare.

link