|
|
|
|
|
by karpathy
4239 days ago
|
|
Thanks for the upvotes! I'm a little conflicted about this being linked around too much because it is still very much work in progress. I work on this guide on a side because I think there's a lot of interest in these models out there, and not enough explanations. On a related note, some of you might also be interested in a Stanford CS class I'll be co-teaching with my adviser next quarter. It will be focused primarily on Convolutional Networks (but a lot of the learning machinery is generic): CS231n: Convolutional Neural Networks for Visual Recognition
http://vision.stanford.edu/teaching/cs231n/ I hope to make a lot of the materials/code freely available so everyone can follow along, and I will continue my work on this guide in parallel whenever I can squeeze in time. (And I'd be happy to hear any feedback on the content/style/presentation) |
|
1) In "Becoming a Backprop Ninja", dx is never declared. Is it a variable that was set in a previous pass, or is it some value like 1 that depends on its function? I understand how to derive dx1 and dx2 from dx.
2) In the SVM, can "pull" be other values than -1, 0, 1? It seems like it might affect how rapidly the SVM learns.
3) It takes 300 iterations to train the binary classifier, could you add a brief paragraph about why this isn't 30 or 3000, and what has the greatest effect on iterations?
4) In 2D SVM, what is the α function? Are w0^2 and w1^2 an arbitrary way to regularize (like would |w0| + |w1| work as well?) It made me think of least squares fitting or RMS power, so I wondered what the reasoning was behind it.
Many more questions (and a sense of irony because I have learned NNs in the past and have since forgotten) so I am trying to give my brain some breadcrumbs.