Hacker News new | ask | show | jobs
by radford-neal 1290 days ago
The problem is that "I'm not sure" has only a few synonyms, like "I don't know", but the correct answer to a complex question can phrased in many ways. For instance, "How do owls catch mice?" could be answered by "Researchers in Britain have found...", or "Owls in Europe...", or "Bird claws can be used to...", or "Mice are often found in...", etc. Even if the model "knows" the answer with high probability, it could be that any particular way of expressing that knowledge is less likely than an expression of ignorance.

And besides that technical issue, since a GPT-style model is trained to mimic the training data, it is _supposed_ to say "I don't know" with a certainly probability that reflects how many people commenting on the matter don't know, even when there are other people who do know. That's not what you want in system for answering questions.

The enterprise is fundamentally misguided. A model for predicting the next word as a person might produce it is not a reliable way of obtaining factual information, and trying to "fix" it to do so is bound to fail in mysterious ways - likely dangerous ways if it's actually used as a source of facts.

In contrast, there are many ways that a GPT-style model could be very useful, doing what it is actually trained to do, particularly if the training data were augmented with information on the time and place of each piece of training text. For example, an instructor could prompt with exam questions, to see what mistakes students are likely to make on that question, or how they might misinterpret it, in order to create better exam questions. Or if time and place were in the training data, one could ask for a completion of "I saw two black people at the grocery store yesterday" in Alabama/1910 and California/2022 to see how racial attitudes differ (assuming that the model has actually learned well). Of course, such research becomes impossible once the model has been "fixed" to instead produce some strange combination of actual predictions and stuff that somebody thought you should be told.