Hacker News new | ask | show | jobs
by visarga 3234 days ago
It's not impossible, it's called inverse reinforcement learning, where they learn a value function from an external demonstration. Then they use this value function for teaching the bot an action policy. Intuitively, the idea is to learn first what are a good state and a bad state, based on external demonstrations, then use that to teach the bot how to act.

This kind of learning is similar to GANs, where the discriminator learns from real data and the generator learns from the discriminator.

1 comments

Very interesting! Thanks for sharing -- I'll look more into this.