| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Buttons840 1159 days ago

Ah we're starting to bootstrap.

For decades in reinforcement learning we've had Q learning, which promises to solve any optimization problem if only we can build a powerful enough function approximator. It can even learn off-policy, meaning it can just watch from the sideline and find the optimal solution. It works for toy problems, and it works in theory, theres even formal proofs that it will work given infinite time and resources, and yet in practice it often becomes unstable and collapses.

Supervised learning is one thing, having a model remain stable while bootstrapping through a complex environment is another. GTP is supervised learning, so far, let's see if it can bootstrap.