|
|
|
|
|
by modeless
677 days ago
|
|
> You scored 11/15. The best language model, llama-2-7b, scored 10/15. I see that you get a random quiz every time, so results aren't comparable between people. I think I got an easy one. Neat game! If you could find a corpus that makes it easy for average humans to beat the LLMs, and add some nice design, maybe Wordle-style daily challenge plus social sharing etc, I could see it going viral just as a way for people to "prove" that they are "smarter" than AI. |
|
> You scored 28/100. The best language model, gpt-4, scored 32/100. The unigram model, which just picks the most common word without reading the prompt, scored 28/100.
Assuming complexity averages out on N=100, small test with LLM score above ~5 is "easy"