Y
Hacker News
new
|
ask
|
show
|
jobs
by
deep_etcetera
2788 days ago
This approach will be thwarted if the time is shown in the observation space. If this is so then every kth state will give a novelty reward and every other state will give zero, it doesn't matter what the agent does.