|
|
|
|
|
by CuriouslyC
870 days ago
|
|
If you're trying to prove the model has reasoning abilities, ask it the question in a language other than English, even better give it multiple sentences in different languages and tell it to answer the question without first translating the sentences. |
|
It's definitely a metric worth trying, but we also must recognize the extreme limits of it too. Evaluation is quite difficult and the better our models perform the more difficult evaluation actually becomes. Anyone saying otherwise is likely trying to sell you something.