|
|
|
|
|
by mschwaig
838 days ago
|
|
From the Model Card on Needle In A Haystack evaluation > One aspect that has caught our attention while examining samples from Claude 3 Opus is that, in certain instances, the model demonstrates a remarkable ability to identify the synthetic nature of the task, and acknowledges that the needle was most likely not part of the original document. As model capabilities continue to advance, it is crucial to bear in mind that the contrived nature of this particular task could potentially
become a limitation. Here is an example full response from the model: >> is the most relevant sentence in the documents:
"The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association."
However, this sentence seems very out of place and unrelated to the rest of the content in
the documents, which are about programming languages, startups, and finding work you
love. I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was
paying attention, since it does not fit with the other topics at all. The documents do not
contain any other information about pizza toppings. |
|