|
|
|
|
|
by BlahBoy3
2502 days ago
|
|
From my understanding, SARSA could be more ideal when there is a greater cost associated with making a mistake whilst learning. SARSA is more conservative, as it takes into account possible large negative rewards during the exploratory phase. The classic example problem is "cliff walking."[0] [0] https://github.com/cvhu/CliffWalking |
|