|
There are a couple of places where RL/Evolutionary Methods/Something Else are jockeying for the top of the leaderboards, but usually if one technique succeeds over the others it just means that someone's put a lot of work into that one particular approach. Anecdotally, I've heard about some optimization problems that should be a good fit for dynamic programming or reinforcement learning where it turns out they actually don't seem to work as desired. For example, optimizing elevator policy: an agent controlling an elevator wants to achieve high throughput of passengers while also minimizing the wait-time of each individual.
We want people to get to their floor quickly but also want to avoid having any individual wait for an excessively long period of time.
The agent can only observe information about the buttons pressed on each floor, although it gets a reward like that reflects our desiderata[1].
As it turns out, most elevators are not running some sort of cool machine optimized policy, but rather something hand-coded. This is not for lack of opportunity, either-- apparently, some researchers pitched Otis on this in the 90s, and it didn't work as well as what they already had, despite this looking like a case where theory should match reality.
Why this is, I don't know, but it might come down to the fact that there's really a lot more to a "pleasant elevator experience" than might be assumed (or modeled by the reward function), or perhaps the hand coded policy incorporates background knowledge about human vertical locomotion unavailable to a machine. ----- 1. Maybe something `R_{t} = Transporting(t) - ∑ Waiting(i, t)`, meaning something like "number of people currently being transported minus the people who are waiting times how long they've been waiting for". |
It's ostensibly the case for other RL problems that we've only recently crossed some compute power viability threshold. Maybe contemporary researchers with contemporary compute power would have better luck?