Hacker News new | ask | show | jobs
by BlahBoy3 2502 days ago
From my understanding, SARSA could be more ideal when there is a greater cost associated with making a mistake whilst learning. SARSA is more conservative, as it takes into account possible large negative rewards during the exploratory phase. The classic example problem is "cliff walking."[0]

[0] https://github.com/cvhu/CliffWalking