| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simianwords 432 days ago

I want to clarify whether the "learn from experience" is still done through RL offline and not autonomously and continuously?

I think the core idea from the paper is that while we have already hit the ceiling of normal kind of data; there's a new kind of data from agents acting in the real world and users (or some one else?) providing rewards based on some ground truth.

Somehow I misinterpreted from this paper that this kind of learning would be autonomous and continuous.