Hacker News new | ask | show | jobs
by maister 1149 days ago
> But it's incorrect to say that they pass every test.

Interesting! So there is published evidence, that it cannot pass novel Winograd schemas? Could you please provide me with the sources? I would like to have a look at the prompts. From my experience you can make GPT-4 pass novel Winograd Tests by giving GPT-4 additional context.

1 comments

I'm not sure if you're joking or trolling, but on the chance that you're not:

>So there is published evidence, that it cannot pass novel Winograd schemas?

There is no published evidence that they can, which is what I said.

>Could you please provide me with the sources?

I cannot prove a negative to you with a source. You will have to review the literature like I did.

>From my experience you can make GPT-4 pass novel Winograd Tests by giving GPT-4 additional context.

This is why I am pretty sure you're trolling. This is a lot like saying "You can make GPT-4 do prime factorisation of a semiprime by noting the constituent primes." The point of testing the model is to see if the model can give you the answer, not to give it the answer. The point of Winograd schemas as a test is that they test the ability to attribute pronouns using common sense, without additional context. The reason it's a good test is that every competent human being can pass the test, and as of yet no computer can.

Not everyone is trolling on the internet my fellow human.

> This is why I am pretty sure you're trolling. This is a lot like saying "You can make GPT-4 do prime factorisation of a semiprime by noting the constituent primes." The point of testing the model is to see if the model can give you the answer, not to give it the answer.

It is not "giving the answer to the model" if you adjust the prompt with additional instructions like "this is a test question" or "use common sense". It is about creating the correct frame of context. With these kind of adjustments GPT-4 passes every novel Winograd scheme I have presented it with.

If that's actually true, please publish it. A blog post will do, it'll probably get popular and force a scientist to reproduce it.