Hacker News new | ask | show | jobs
by eggie5 871 days ago
if you have an NN that is probabilistic, how do you update the prior after sampling from the posterior?
1 comments

You take the action which you computed to be optimal under the hypothetical of your posterior sample; this then yields a new observation. You add that to the dataset, and train a new NN.
ah, so observe the reward and then take a gradient step
(Well, not necessarily, which is why I framed it as training from scratch, to make it clearer that it doesn't have anything necessarily to do with SGD or HMC etc. In theory it shouldn't matter via the likelihood principle, but in practice, taking a gradient step might not give you the same model as you would if you trained from scratch. You'd like it to be a gradient step because that would save you a ton of compute, but I don't know how well Bayesian NNs actually do that. And if that works OK in supervised problems or the simplest bandit RL, it might not work in full PSRL uses because DRL is so unstable.)