Hacker News new | ask | show | jobs
by taneq 443 days ago
In all reinforcement learning there is (explicitly as part of a fitness function, or implicitly as part of the algorithm) some impetus for exploration. It might be adding a tiny reward per square walked, a small reward for each block broken and a larger one for each new block type broken. Or it could be just forcing a random move every N steps so the agent encounters new situations through “clumsiness”.
1 comments

That is right, there is usually a parameter on the action selection function -- the exploitation vs exploration balance.