|
|
|
|
|
by smuser
931 days ago
|
|
My understanding (not an expert) is a lot of problem domains have very sparse / infrequent rewards - imagine if the only reward you gave a minecraft agent was when it mined a diamond, it would take a lot of gameplay for it to randomly do that and get a reward. So researchers spend time tuning the reward space (oh you mined some dirt, here's a tiny reward. Oh you mined rock, a greater reward, etc) but it's kind of akin to hand crafted feature detection from the pre-neural network days. The Q* mystery is did OpenAI 'solve' reward modelling the same way neural networks solved feature detection. |
|