Hacker News new | ask | show | jobs
by deep_etcetera 2788 days ago
This approach will be thwarted if the time is shown in the observation space. If this is so then every kth state will give a novelty reward and every other state will give zero, it doesn't matter what the agent does.