|
|
|
|
|
by Rhapso
729 days ago
|
|
This is related, and it is the paper that lives constantly rent free in my head. I think it will retroactively be viewed as revolutionary: https://www.alexwg.org/publications/PhysRevLett_110-168702.p... Basically, intelligent behavior is optimizing for "future asymptotic entropy" vs maximizing any immediate value. How intelligent a system is then become a measure of how far in the future it can model and optimize entropy effectively for. (updated with pdf link) |
|
[1]: Thermodynamic Game Theory: https://adamilab.msu.edu/wp-content/uploads/AdamiHintze2018....
[2]: piKL - KL-regularized RL: https://arxiv.org/abs/2112.07544
[3]: Soft-Actor Critic - Entropy-regularized RL: https://arxiv.org/abs/1801.01290
[4]: "Soft" (Boltzmann) Q-learning = Entropy-regularized policy gradients: https://arxiv.org/abs/1704.06440