Hacker News new | ask | show | jobs
by howlin 2005 days ago
One thing to keep in mind about direct (learn the policy/behavior) versus indirect (learn the model and then simulate behaviors on the model to choose the best) is that sometimes it's much easier to find a good enough policy than it is to learn an accurate enough model for simulation. Driving is a good example of this. Most of the time all you need to do is stay in your lane and obey the rules for intersections. A simulation of a driving environment, on the other hand, is quite difficult.