|
|
|
|
|
by simianwords
432 days ago
|
|
I want to clarify whether the "learn from experience" is still done through RL offline and not autonomously and continuously? I think the core idea from the paper is that while we have already hit the ceiling of normal kind of data; there's a new kind of data from agents acting in the real world and users (or some one else?) providing rewards based on some ground truth. Somehow I misinterpreted from this paper that this kind of learning would be autonomous and continuous. |
|