|
|
|
|
|
by jg8610
3701 days ago
|
|
It's good to see people write up their experiments, it's useful for the rest of us to test how we understand neural nets. I think there are a few mistake in your maths though. You can learn a 1-1 discrete mapping through a single node where you are using a one-hot vector. You just assign a weight to each of the input nodes, and then use a delta function on the other side. If I understood correctly, this is what you are doing. Also, if you use a tanh in your input layer, but keep a linear output layer (as you start off with), you are still doing a linear approximation because you have a rank H (where H is the hidden layer) matrix that is trying to linearly approximate your input data. This is done optimally using PCA. I'd second the advice to look into the coursera courses, or the nando de freitas oxford course on youtube (that actually has a really nice derivation of backprop). |
|