| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jrx 2849 days ago
	I’d be very interested to hear for which similar problems (games, control problems) reinforcement learning is not the best approach (there exists other, qualitatively better method£. We hear a lot about success stories but not a lot about limitations of RL

3 comments

clickok 2849 days ago

There are a couple of places where RL/Evolutionary Methods/Something Else are jockeying for the top of the leaderboards, but usually if one technique succeeds over the others it just means that someone's put a lot of work into that one particular approach.

Anecdotally, I've heard about some optimization problems that should be a good fit for dynamic programming or reinforcement learning where it turns out they actually don't seem to work as desired.

For example, optimizing elevator policy: an agent controlling an elevator wants to achieve high throughput of passengers while also minimizing the wait-time of each individual. We want people to get to their floor quickly but also want to avoid having any individual wait for an excessively long period of time. The agent can only observe information about the buttons pressed on each floor, although it gets a reward like that reflects our desiderata[1]. As it turns out, most elevators are not running some sort of cool machine optimized policy, but rather something hand-coded.

This is not for lack of opportunity, either-- apparently, some researchers pitched Otis on this in the 90s, and it didn't work as well as what they already had, despite this looking like a case where theory should match reality. Why this is, I don't know, but it might come down to the fact that there's really a lot more to a "pleasant elevator experience" than might be assumed (or modeled by the reward function), or perhaps the hand coded policy incorporates background knowledge about human vertical locomotion unavailable to a machine.

-----

1. Maybe something `R_{t} = Transporting(t) - ∑ Waiting(i, t)`, meaning something like "number of people currently being transported minus the people who are waiting times how long they've been waiting for".

link

hhmc 2848 days ago

Or perhaps those researchers in the 90s just whiffed?

It's ostensibly the case for other RL problems that we've only recently crossed some compute power viability threshold. Maybe contemporary researchers with contemporary compute power would have better luck?

link

tnecniv 2849 days ago

One class of examples would be problems where probing the environment is expensive. In many robotics applications, this can be tricky because the robot might break or not return to a suitable initial condition without human oversight. Moreover, the physical experiment can relatively slow so little data is generated, and what is generated may not generalize well. That being said, how to overcome these challenges is a very active research topic in the community.

link

hhmc 2849 days ago

It probably depends on how you define "best approach".

For example in chess - sure, alphazero is better than stockfish - but stockfish is "good enough" (100s of elo better than the best human), without the cpu-cycle cost of alphazero.

link