Hacker News new | ask | show | jobs
by ilaksh 2002 days ago
I used to be a bit more excited about RL. I mean, it's still definitely something I have to learn, but one aspect of it _seems_ lacking to me and is messing with my motivation to learn it. I'm sure someone will happily explain all the ways I am ignorant.

It seems like there is a lot of emphasis on "direct RL" or whatever where they don't even really think about the model much, but it's I guess often inside of the policy or something?

But it seems to me as someone who has just started learning about robotics, that I absolutely need to first verify that I have an accurate model of the environment which I can inspect. It seems like a lot of RL approaches might not even be able to supply that.

I mean what I am stuck on as far as creating a robot (or virtual robot) is having a vision system that does all of the hard things I want. I feel like if I can detect edges and surfaces and shapes in 3D, parts of objects and objects, with orientation etc., and in a way I can display and manipulate it, that level of understanding will give me a firm base to build the rest of learning and planning on.

I know all of that is very hard. It seems like they must have tried that for awhile and then kind of gave up to head down the current direction of RL? Or just decided it wasn't important. I still think it's important.

3 comments

I don't think people have given up on model based RL, it is just that describing a proper model is (like you are saying) very difficult.

in the case you haven't seen or read the following: https://bair.berkeley.edu/blog/2019/12/12/mbpo/

You do not necessarily need to fully know the environment you are in, but you need to be able to evaluate how good the actions that you can take are in terms of an utility function. That’s how a RL algorithm can learn that going through a wall is a bad decision (reward(“ahead”) <= “$0“), and then decides for something else such as turning right or left (reward(“Left” || “right”) > “$0”).

I think the main problem with RL is deciding if an utility function — as precise as it may be — can fully capture/estimate all nuances of an environment. Another problem is at adapting to the environment by having new actions added dynamically into your model and having it to converge as quickly as possible.

One thing to keep in mind about direct (learn the policy/behavior) versus indirect (learn the model and then simulate behaviors on the model to choose the best) is that sometimes it's much easier to find a good enough policy than it is to learn an accurate enough model for simulation. Driving is a good example of this. Most of the time all you need to do is stay in your lane and obey the rules for intersections. A simulation of a driving environment, on the other hand, is quite difficult.