|
My understanding is RL is a reasonable attack for situations where the environment is either (1) mathematically uncharacterized (2) insufficiently characterized (3) characterized, but resulting model is too complex to use, and therefore RL simultaneously explores the environment in simple ways and takes actions to maximize some objective function. However, there are many environments (chemical/power plants, machines, etc.) where there are good mathematical/empirical data-based models, where model-based optimal control works extremely well in practice (much better than RL). I'm wondering why the ML community has elected to skip over this latter class of problems with large swaths of proven applications, and instead have gone directly to RL, which is a really hard problem? Is it to publish more papers? Or because self-driving cars?* (* optimal control tends to not work too well in highly uncertain, non-characterized, changing environments -- self-driving cars are an example of one such environment, where even the sensing problem is highly complicated, much less control) |
Using model based methods can allow you to do some pretty fancy stuff while massively reducing the number of data samples you need, but on the other side there's a trade off. Using the model usually tends to require lots of not-very-parallelizable computations, and can be more costly computationally. Very large problems can get out of hand pretty quickly, and there's still a lot of work to do before there is something which can be applied in general quickly and efficiently.