Hacker News new | ask | show | jobs
by gabrielgoh 3151 days ago
could you elaborate on this point? what you're saying sounds like dynamic programming, which does not reduce the state space at all, just saves on redundant computations (and is a favourite of programming interviews everywhere)
1 comments

Training via bootstrapping (i.e. dynamic programming) does reduce the state space for search when working with function approximation. It represents a bias in what sorts of values the value function approximator should predict for each state. It encodes a sort of "local continuity/coherence" constraint that wouldn't necessarily be induced by simply training to predict the raw values -- collected stochastically via interaction with the environment. This local coherence constraint acts as a regularizer (i.e. bias) while training the value function approximator.