Y
Hacker News
new
|
ask
|
show
|
jobs
by
isaacimagine
370 days ago
DT's reward-to-go vs. QL's Bellman incl. discount, not choice of architecture for policy. You could also do DTs with RNNs (though own problems w/ memory).
Apologies if we're talking past one another.