Hacker News new | ask | show | jobs
by dhammack 4576 days ago
I like to think of the vectorized representation as just a nonlinear transformation to a higher dimensional space with a classifier afterwords. If you're familiar with linear algebra, then z = Wx, where W is a matrix of weights and x is a feature vector maps x (which could be something like 5 dimensional) to a new space (which could be like 50 dimensional). z is the representation of x in that new space. After this linear mapping, we apply a nonlinear transform (sigmoid, rectifier, etc). If we didn't have the nonlinear transform, then the entire model would just be linear! This follows from the fact that the composition of linear functions is itself linear.

The final layer is just a standard logistic regression classifier in the new (usually higher dimensional) space.

1 comments

Haha I get that! I was just saying for someone learning it at first vs approaching it from an object oriented angle with individual neurons and having it be a graph data structure. It seems easier to tackle when you can summarize it exactly as you said there vs, oh there's these neurons with these connections and you forward propagate each individual weight vector then backpropagate etc...