Hacker News new | ask | show | jobs
by GaggiX 1152 days ago
>There's nothing analogous to a "reward" or "punishment" when neural networks are learning.

Well deep reinforcement learning.

1 comments

Yeah but even in that case, "reward" is just the thing a NN is trying to predict. The NN itself is not receiving the reward (or any punishment). Instead, it's following gradient signals to improve that estimate of reward, which is then used as a proxy for an optimal policy decision.