|
|
|
|
|
by Eridrus
89 days ago
|
|
Right, but if these things are so rare that we all only know the one viral example, I feel like that lends credence to the models basically generally not having this problem. Researchers built the Winnograd Schema Challenge more than a decade ago to assess common sense reasoning, and LLMs beat that challenge task around GPT 4. |
|