Hacker News new | ask | show | jobs
by serialx 510 days ago
Ah sorry, you might be right. I meant "sparse reward" as a reward system that is mostly 0 but occasionally 1. Your "sparse reward" means only providing reward at the end of each output.
1 comments

> Ah sorry, you might be right. I meant "sparse reward" as a reward system that is mostly 0 but occasionally 1.

Did we introduce the abusive pressure of Korean educational culture to machines?