|
|
|
|
|
by riku_iki
1537 days ago
|
|
> it does on "chained inference" tasks To me, it is more proof of "stochastic parrot" behavior: model seen most of the available math information in internet, and even with significant computational power, can solve only 58% of elementary school level questions, and they were probably those with clear examples in training data, and can't generalize on those beyond. |
|
The process kinda goes like this -
Think of ten answers to this question: blah blah blah
From these ten answers, which are the best 3?
Of the three answers, which is the best?
Revise and edit the best answer to be simpler or more understandable.
Prompt engineering is a nascent field, and we haven't seen nuanced or sophisticated use of the tool yet. Most of the metrics reported in papers are barely better than a naive Turing test. It doesn't take much introspection to know that even humans endlessly iterate and revise their output, and the best extemporaneous speech doesn't match well curated and edited material. It shouldn't surprise us that similar editing and revision processes will benefit transformer output.