|
|
|
|
|
by visarga
1537 days ago
|
|
This paper shows we can combine models like lego bricks even without end-to-end training using language as intermediate representation. That means more flexibility in training the models, each on its own dataset, and more ways they can be combined in. By getting rid of fine-tuning the models may retain their robustness to distribution shifts. |
|
Gpt-3 type models are very good at selecting for arbitrary qualities from among a list of options. Generating a list of 10 potential answers, then running prompts on the candidates to select for quality, accuracy, style, and so forth resembles the cyclic formulation of ideas in humans. The process used to generate essays and articles - draft, edit, revise, simplify, repeat until satisfied - can be implemented trivially. Those processes will transfer to larger models, and things like RETRO reduce resources by orders of magnitude.
Cognitive architecture seems to be an accurate descriptor of the use of multiple models and the logic layers for many-shot, many model development.
It may not be human level with zero-shot output, but how many humans produce human-level output in their stream-of-consciousness output? The act of consideration, recursing over an idea and refining it, is achievable with these models in a way that humans can debug and tweak cycle to cycle.
Multipass "consideration" and revision methodologies can capture almost any meta-cognitive processes used by humans, whether it's Socratic method or the AP style guide or an arbitrary jumble of rules derived from 4chan posters.