|
|
|
|
|
by jac08h
511 days ago
|
|
While studying for an RL course, I created a reference for several algorithms with a brief description of what limitations they solve. Example: Problem: SARSA pushes q-values towards the current policy, but ideally we'd want optimal values.
Solution: Use the best action in TD-target calculation -> Q-learning Perhaps someone else will find it helpful! |
|
Only wish you publicised it before the exam haha :-)
492982