|
|
|
|
|
by berndi
961 days ago
|
|
You’re confused about what “statistical parrot” means and you don’t seem to understand the difference between an optimization objective and the resulting model. The term “parrot” is used to imply inference by something akin to a look-up table, specifically it is used to indicate poor out-of-sample performance and a lack of a proper world model. The optimization objective is irrelevant when determining the generalization performance of a model and when judging whether it can reason beyond looking up answers in a table. As the user above noted, it is now quite well established that GPT-4 has impressive out-of-sample performance which can be explained by it possessing an actual model of the world and not being a “parrot”. |
|
Err... I can show this is false, kinda trivially. People who engage in prompt-confirmation-bias aren't aware of what the in-sample is.
It's basically everything ever digitised: you can ask it for the first paragraph of every dickens novel, to what the average petal length of an iris flower is -- etc.
How are you measuring the in-sample here?
If you engage in straightfoward reasoning from first principles, and are basically aware of what the training data is, you can show in 10 seconds critical failures of generalisation.
If you want a recipe: go find some fringe api docs. Establish that it has been trained on them. Then, since they're fringe there wont be much code on github, etc. Now ask it do something non-trivial with that API. It will fail, and the mechanism will be obvious: it'll jam in correlated code that lacks relevance.
Do the same on a popular API, and see it succeed.
The in-sample will be obvious for both, and the bounday of generalisation