Hacker News new | ask | show | jobs
by mytochar 4000 days ago
Regarding backpropagation and training sections of the NN at different times, there are other training algorithms. Evolutionary training algorithms come to mind, and you could really evolve any section you wanted. You could even train the output of each layer one by one to represent a certain form of input to the future layer.
2 comments

Virtually everyone uses gradient based methods in the end to fine tune the weights.

Yes, there are other methods. Contrastive divergence seems to be king right now - of note is Minimum probability flow learning [1] (of which CD is a special case of). However the flavor of these methods tends to be tuning the weights of the model in such a way to maximize how close the model comes to sharing the probability distribution of the data. One can generally not constraint the model parameters (ie by freezing a layer) and retain the models ability to 'learn' the data distribution.

[1]http://arxiv.org/abs/0906.4779

Those training algorithms should accomplish the same goal, but backpropagation (or some variant) turns out to be the best choice for the kind of network shown here (large, non-recurrent), because it directly uses the learning gradient, while evolutionary methods don't usually make use of this information. I'm not a researcher in this area, but I'm quite sure you'd have a bad time going with pure evolutionary methods for this kind of network.