|
|
|
|
|
by jrx
2849 days ago
|
|
I’d be very interested to hear for which similar problems (games, control problems) reinforcement learning is not the best approach (there exists other, qualitatively better method£. We hear a lot about success stories but not a lot about limitations of RL |
|
Anecdotally, I've heard about some optimization problems that should be a good fit for dynamic programming or reinforcement learning where it turns out they actually don't seem to work as desired.
For example, optimizing elevator policy: an agent controlling an elevator wants to achieve high throughput of passengers while also minimizing the wait-time of each individual. We want people to get to their floor quickly but also want to avoid having any individual wait for an excessively long period of time. The agent can only observe information about the buttons pressed on each floor, although it gets a reward like that reflects our desiderata[1]. As it turns out, most elevators are not running some sort of cool machine optimized policy, but rather something hand-coded.
This is not for lack of opportunity, either-- apparently, some researchers pitched Otis on this in the 90s, and it didn't work as well as what they already had, despite this looking like a case where theory should match reality. Why this is, I don't know, but it might come down to the fact that there's really a lot more to a "pleasant elevator experience" than might be assumed (or modeled by the reward function), or perhaps the hand coded policy incorporates background knowledge about human vertical locomotion unavailable to a machine.
-----
1. Maybe something `R_{t} = Transporting(t) - ∑ Waiting(i, t)`, meaning something like "number of people currently being transported minus the people who are waiting times how long they've been waiting for".