| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by BlahBoy3 2502 days ago
	From my understanding, SARSA could be more ideal when there is a greater cost associated with making a mistake whilst learning. SARSA is more conservative, as it takes into account possible large negative rewards during the exploratory phase. The classic example problem is "cliff walking."[0] [0] https://github.com/cvhu/CliffWalking