Hacker News new | ask | show | jobs
by judk 4514 days ago
If you learn that all actions `a` from state `s_i` have very low reward, does that propagatet backward to `s_j,a` that feed into `s_i`?