| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by isaacimagine 370 days ago
	DT's reward-to-go vs. QL's Bellman incl. discount, not choice of architecture for policy. You could also do DTs with RNNs (though own problems w/ memory). Apologies if we're talking past one another.