Hacker News new | ask | show | jobs
by andbberger 4000 days ago
Virtually everyone uses gradient based methods in the end to fine tune the weights.

Yes, there are other methods. Contrastive divergence seems to be king right now - of note is Minimum probability flow learning [1] (of which CD is a special case of). However the flavor of these methods tends to be tuning the weights of the model in such a way to maximize how close the model comes to sharing the probability distribution of the data. One can generally not constraint the model parameters (ie by freezing a layer) and retain the models ability to 'learn' the data distribution.

[1]http://arxiv.org/abs/0906.4779