|
|
|
|
|
by andbberger
4000 days ago
|
|
Virtually everyone uses gradient based methods in the end to fine tune the weights. Yes, there are other methods. Contrastive divergence seems to be king right now - of note is Minimum probability flow learning [1] (of which CD is a special case of). However the flavor of these methods tends to be tuning the weights of the model in such a way to maximize how close the model comes to sharing the probability distribution of the data. One can generally not constraint the model parameters (ie by freezing a layer) and retain the models ability to 'learn' the data distribution. [1]http://arxiv.org/abs/0906.4779 |
|