| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mirekrusin 811 days ago

It also has further reaching consequences.

It creates foundation for reinforcement learning without human feedback - a missing piece of puzzle.

Simplifying: propose plausible theorem, try to find provable solution, reinforce reasoning/solution path, move proved statement into axioms, repeat.

(super)intelligence has many dimentions. One of less explored ones is exploiting concurrency in thought chains. It's something very un-natural to us, but there is a lot of gain if you're able to branch and collect feedback from dead ends and progress from different directions being taken at the same time.