|
|
|
|
|
by mirekrusin
811 days ago
|
|
It also has further reaching consequences. It creates foundation for reinforcement learning without human feedback - a missing piece of puzzle. Simplifying: propose plausible theorem, try to find provable solution, reinforce reasoning/solution path, move proved statement into axioms, repeat. (super)intelligence has many dimentions. One of less explored ones is exploiting concurrency in thought chains. It's something very un-natural to us, but there is a lot of gain if you're able to branch and collect feedback from dead ends and progress from different directions being taken at the same time. |
|