Pure LLMs alone can probably not go vastly beyond human intelligence because they rely on imitating humans (human text). I think there is still one or two breakthroughs missing to get robotics solved.
PaLM aalready includes some LLM-generated training (consensus of different approaches), and these kinds of synthetic self-driven training metrics will only get more sophisticated and effective at improving the capabilities. It’s conceivable that we will start seeing AlphaZero-like improvement curves in reasoning.
I'm not sure how consensus would get you significantly above human baseline. Doesn't that just get you some sort of average?
The basic problem with synthetic self-training is that we need some reward function which tells us whether a given synthetic example is good. In case of AlphaGo Zero, this was a synthetic strategy which won the game, or scored a lot. Which can be automatically detected. But how do we automatically recognize that synthetic text has "high quality"?
One case where it might work is proofs in a formal proof language which can be checked automatically via software. So if a language model is tasked to generate synthetic conjecture/proof pairs, it is possible to automatically recognize the correct ones, and use that for self-training data (unsupervised, supervised, reinforcement, I'm not sure), enabling it to recursively create more complex synthetic proofs.
A very similar approach (with some sort of unit tests instead of proofs) is described here in more detail:
https://arxiv.org/abs/2207.14502
It was a while that I read it, so my description above is kinda fuzzy. It might involve some adversarial step that I missed.
One problem is to get this process off the ground (bootstrapping), which is difficult, since we need some baseline capability first to create any successful synthetic examples, and there aren't a lot of human created formal proofs which can be used as bootstrapping training data.
Another problem is that, even if it worked, this system would just be good at generating proofs. Maybe there is some amount of transfer to natural language intelligence, but I'm not sure about that.
If you have a different idea for creating a reward signal, I would be interested how it could be done.