| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aabajian 3967 days ago

I was fortunate to take this class the first time it was offered. I found it a great introduction to the material, but a bit over my head. Deep learning requires a strong grasp of linear algebra - and particularly at the "Stanford" level. My undergrad didn't prepare me well for visualizing outer products and matrix / tensor derivatives. Once you get over those hurdles, deep learning is quite fun. It often works like magic. I'll give you an example:

A firetruck is _____

Try typing this in Google and you'll get "red", "moving" and "made". During the course you build a network that trains next-word completions using arbitrary bodies of text. You can train it for hours, days or weeks...and it just gets better and better. Eventually you will max out the capacity of your network, but then you can fiddle with the number of nodes and other hyperparameters. In the end you're just training a "black-box" nonlinear function to best approximate an unknown function defined by training data.

3 comments

xigency 3967 days ago

That example is just a simple Markov model. Using the 'T9' method of completing text is more of a novelty than something useful. I also have trouble with 'complete the sentence' type of programs because they don't actually create new ideas, they just rehash data. (It does have use in OCR, voice recognition, and typing/texting.)

I agree that the math can be complex, but I think it boils down to probability and the notation of presenting the ideas more than the underlying concepts. I feel like the most advanced math used in NLP is the log function, personally. Along with working with big arrays of data, or structures like Markov models and neural nets, which tend to be just arrays of numbers.

In a normal AI course, we had to form write-ups of contemporary AI articles, and one I found interesting was a model for summarizing text, including chapters, books, and other writing. The key idea was finding the most significant sentences in any given paragraph or unit and then using that verbatim.

It might be interesting to take some of these simple ideas and flesh them out with some of these advanced AI methods. For example, finding a more complete meaning of a book chapter and rewriting the summary.

That's the kind of AI work that I think people expect and are looking for from the NLP field, and it's not necessarily out of reach currently.

link

agibsonccc 3967 days ago

I think a common example along the same vein is the analogies trick you always see. It's been demonstrated to death at this point but the great thing here is word2vec more or learns to predict the next word using hierarchical softmax so he's not technically "wrong" since this is the training objective. It's good to clarify it though.

link

xigency 3967 days ago

Yes, and I guess that goes along with the black box idea. What function you are training for depends on your needs, and that can be achieved with deep learning or soft AI.

link

gherkin0 3967 days ago

> and one I found interesting was a model for summarizing text, including chapters, books, and other writing.

Do you have a cite for that?

link

xigency 3967 days ago

Yes, I just found it actually. The article targets short stories specifically.

A. Kazantseva and S. Szpakowicz, "Summarizing Short Stories." Assoc. for Computational Linguistics, vol. 36, no. 1, pp. 71-109, Mar. 2010. [Online]. Available: http://www.mitpressjournals.org/doi/abs/10.1162/coli.2010.36...

There is a PDF available. It's about 40 pages long.

link

aadamson 3967 days ago

Your paper is the only paper listed above mine on the reports page! Solid last name optimization

link

aabajian 3967 days ago

lol and my first name is "Aaron" thank my parents.

link

akhilcacharya 3967 days ago

Where'd you do your undergrad?

link

aabajian 3967 days ago

I'd rather not bash my undergrad, but suffice to say I tutored intro linear algebra and was very comfortable with eigenvalues, eigenvectors, Gaussian elimination, and that kind of stuff. What was tricky in 224d was taking the gradients with respect to specific components of a matrix. In the end you get comfortable with what the result should look like, but if you actually write the matrix indices down, it's quite hairy (mostly tensor product(s) that can be rewritten as matrix outer products).

link

viksit 3967 days ago

oh yes. thinking in terms of numpy matrix operations while reading the equations took a lot of getting used to.

link

dopeboy 3967 days ago

I went to school with Aaron. Zot zot is all I'll say.

link