Hacker News new | ask | show | jobs
by michael_nielsen 4585 days ago
They're discussed later in the book. The first chapter is an introduction, and I didn't want to introduce convolutional nets before (for example) fundamental techniques such as stochastic gradient descent and backpropagation.
1 comments

Great! How far does the book go in terms of advanced approaches? Up to the current state of research?
My current plan is to describe some pretty recent results -- most likely, the big breakthrough on ImageNet by Krizhevsky, Sutskever and Hinton (http://www.cs.utoronto.ca/~ilya/pubs/2012/imgnet.pdf), which uses convolutional nets. I may also describe the famous Google-Stanford "cat neuron" paper (http://ai.stanford.edu/~ang/papers/icml12-HighLevelFeaturesU... ). But at this point things are moving so quickly that I'll keep my options open, and if more exciting things come up, I may change my plans.

Of course, there's a tremendous amount going on, so my broader philosophy is to focus on fundamentals. Readers who thoroughly master the core ideas shouldn't have much trouble later getting up to speed with the result-of-the-month.

>some pretty recent results -- most likely, the big breakthrough on ImageNet by Krizhevsky, Sutskever and Hinton (http://www.cs.utoronto.ca/~ilya/pubs/2012/imgnet.pdf), which uses convolutional nets.

kernels learned by the first convolutional layer (the figure 3. on page 6) have uncanny resemblance to Gabor function-modeled orientation-selective cells ("bars and grating cell") in the primary visual cortex. Looks like computers are on the right track :)

http://www.cs.rug.nl/~petkov/publications/bc1997.pdf

"The discovery of orientation-selective cells in the primary visual cortex of monkeys almost 40 years ago and the fact that most of the neurons in this part of the brain are of this type ..."

The difference here is a "number game" - visual cortex contains cells whose receptive fields' positions, eccentricities, sizes, orientation, number of excitatory and inhibitory zones (e.g. Fig.1 in the link) make a reasonable coverage for the space of possible values. Ie. the number of these cells is in the millions vs. 96. Of course it is only a matter of computing power to run all reasonable combinations of kernels emulating the real visual cortex, yet it would put immense computational challenge onto the second and next layers until we understand what [should] happens there.

FWIW, many vision researchers believe that the resemblance of the first convolutional layer to Gabor filters is perhaps more a case of selection bias than anything else. The argument goes that were they not the output of the first layer, that paper wouldn't get accepted =)

I'm not sure if I fully believe this, but certainly there doesn't seem to be a very principled way to choose your network architecture. Different people propose different ones, and the fundamental justification for each one seems to be: "look, we recreate gabor filters in layer 1 and we get good numbers at the end!"

Of course, NN people argue that that's almost exactly what vision people do as well, except in "feature-land" rather than "architecture-land".

>FWIW, many vision researchers believe that the resemblance of the first convolutional layer to Gabor filters is perhaps more a case of selection bias than anything else. The argument goes that were they not the output of the first layer, that paper wouldn't get accepted =)

well, i can see the temptation - the orientation and spatial frequency selectivity are the major characteristics of cells in V1 and the receptive field for the first layer there does look like Gabor

http://www.scholarpedia.org/article/Area_V1#Receptive_fields

I agree that such a good resemblance of the learned kernels to Gabor is too good, this is why i used "uncanny" :) If it is real then i think it manifests very interesting and, no pun intended, deep emerging properties of the neural net learning process (something along the lines "maximum entropy kernels while still doing the job" as the asymptotic state)

Btw, is it really selection or confirmation bias?

And to expand on previous point of convoluting the input with many-many kernels - happens to be at the order of 40 per "pixel":

"V1 contains a vast number of neurons. In humans, it contains about 140 million neurons per hemisphere (Wandell, 1995), i.e. about 40 V1 neurons per LGN neuron. Such divergence gives scope for extensive processing of the images received from LGN."