Hacker News new | ask | show | jobs
by DennisP 921 days ago
> prioritize cumulative long-term feedback over immediate feedback and can adapt to environments with limited observability, sparse feedback, and high stochasticity

Sounds like something that could learn to play decent poker.