Hacker News new | ask | show | jobs
by wegfawefgawefg 470 days ago
well thats the golden issue with rl, sample efficiency. it is env bounded, so you want an architecture that extracts the max possible information from each collected sample, avoiding catastrophic forgetting, prioritizing samples according to relevance