Hacker News new | ask | show | jobs
by CuriouslyC 870 days ago
If you're trying to prove the model has reasoning abilities, ask it the question in a language other than English, even better give it multiple sentences in different languages and tell it to answer the question without first translating the sentences.
1 comments

That's not a great metric and is going to be incredibly language dependent. For example, the European languages all have a lot of similarities and so it should be unsurprising that a model trained on English can do pretty well on French and German. But then if you are to ask it a language that is fairly disjoint (say Chinese) then you are held back by the lack of language data from that dataset (or you run into the exact same issue as previously).

It's definitely a metric worth trying, but we also must recognize the extreme limits of it too. Evaluation is quite difficult and the better our models perform the more difficult evaluation actually becomes. Anyone saying otherwise is likely trying to sell you something.