|
There are brief sections about this in the Deep Learning book by Bengio, Courville and Goodfellow (2016): > 18.2 Because the negative phase involves drawing samples from the model’s distri- bution, we can think of it as finding points that the model believes in strongly. Because the negative phase acts to reduce the probability of those points, they are generally considered to represent the model’s incorrect beliefs about the world. They are frequently referred to in the literature as “hallucinations” or “fantasy particles.” In fact, the negative phase has been proposed as a possible explanation for dreaming in humans and other animals (Crick and Mitchison, 1983), the idea being that the brain maintains a probabilistic model of the world and follows the gradient of log p ̃ while experiencing real events while awake and follows the negative gradient of log p ̃ to minimize log Z while sleeping and experiencing events sampled from the current model. This view explains much of the language used to describe algorithms with a positive and negative phase, but it has not been proven to be correct with neuroscientific experiments. In machine learning models, it is usually necessary to use the positive and negative phase simultaneously, rather than in separate time periods of wakefulness and REM sleep. As we will see in Sec. 19.5, other machine learning algorithms draw samples from the model distribution for other purposes and such algorithms could also provide an account for the function of dream sleep. > 19.5.1 Wake-Sleep One of the main difficulties with training a model to infer h from v is that we do not have a supervised training set with which to train the model. Given a v,we do not know the appropriate h. The mapping from v to h depends on the choice of model family, and evolves throughout the learning process as θ changes. The wake-sleep algorithm (Hinton et al., 1995b; Frey et al., 1996) resolves this problem by drawing samples of both h and v from the model distribution. For example, in a directed model, this can be done cheaply by performing ancestral sampling beginning at h and ending at v. The inference network can then be trained to perform the reverse mapping: predicting which h caused the present v. The main drawback to this approach is that we will only be able to train the inference network on values of v that have high probability under the model. Early in learning, the model distribution will not resemble the data distribution, so the inference network will not have an opportunity to learn on samples that resemble data. Another possible explanation for biological dreaming is that it is providing samples from p(h,v) which can be used to train an inference network to predict h given v. In some senses, this explanation is more satisfying than the partition function explanation. Monte Carlo algorithms generally do not perform well if they are run using only the positive phase of the gradient for several steps then with only the negative phase of the gradient for several steps. Human beings and animals are usually awake for several consecutive hours then asleep for several consecutive hours. It is not readily apparent how this schedule could support Monte Carlo training of an undirected model. Learning algorithms based on maximizing L can be run with prolonged periods of improving q and prolonged periods of improving θ, however. If the role of biological dreaming is to train networks for predicting q, then this explains how animals are able to remain awake for several hours (the longer they are awake, the greater the gap between L and log p(v), but L will remain a lower bound) and to remain asleep for several hours (the generative model itself is not modified during sleep) without damaging their internal models. Of course, these ideas are purely speculative, and there is no hard evidence to suggest that dreaming accomplishes either of these goals. Dreaming may also serve reinforcement learning rather than probabilistic modeling, by sampling synthetic experiences from the animal’s transition model, on which to train the animal’s policy. Or sleep may serve some other purpose not yet anticipated by the machine learning community. |