Hacker News new | ask | show | jobs
Meta-Reinforcement Learning (blog.floydhub.com)
65 points by mtrazzi 2643 days ago
4 comments

This seems to be a contextual bandit where the previous reward is included in the context.

I can’t come up with real world examples where the behavior of the reward function is changing like this to warrant making different decisions based on a previous reward.

While a contextual bandit could be learning using more parameters about the environment than the multi-armed bandit posed in the example, it still has the goal of finding the best reward given the specific domain (while by and large using the same strategy as the multi armed bandit). But placed in a different context, the state space and underlying model parameters could be completely different, so the previous reward for any given state, or sequence of states, could be irrelevant.

The goal here is to reward the agent for the search strategy they employed to arrive at their answer, not the quality of the answer itself.

One possible use case (directly related to their example with multi-armed bandits, possibly learnt by a contextual bandit but requires a good deal more modeling) could be retail pricing, where different categories of products have drastically different demand curves. A meta-algorithm has the promise of generalizing better and rapidly arriving at the optimal pricing across a wide range of similar price curves.

we already know that deep RL is sample inefficient... is meta-RL really useful for something not trivial? This seems rather silly.
The first applications appear really simple because they illustrate cognitive abilities that imitate planning/model-based RL related to neuroscience/psychology.

It doesn't mean that meta-RL won't scale up with more computation (see http://www.incompleteideas.net/IncIdeas/BitterLesson.html).

I don't reall yget the point of your article ? You seem quite dogmatic and don't even discuss other hypotheses.
This article's intended audience is clearly the subset of ML engineers that have a practical rather than a theoretical/academic background in ML. I'd argue it definitely has a use as a practical guide to understanding an original approach to RL, which looks like it has good potential. You could fairly argue that the article could use more mathematical grounding to what's being explained, like AMS blog posts tend to be. However, consider how afraid of mathematics a lot of the CS crowd tends to unreasonably be. The pedagogy of the article is noteworthy, it helps the reader get a hold of the jargon and ideas of this burgeoning approach. It prepares them for further research, kind of like Quanta Magazine tries to do, but it allows itself more technicality, in line with the blog post format. It's not an easy task given how multidisciplinary meta-RL really is, and the author does a rather great job, IMO.
Any example for real world application?
For more information about why meta-learning (and in particular meta-RL) is useful see this post: https://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/