Hacker News new | ask | show | jobs
by modeless 677 days ago
> You scored 11/15. The best language model, llama-2-7b, scored 10/15.

I see that you get a random quiz every time, so results aren't comparable between people. I think I got an easy one. Neat game! If you could find a corpus that makes it easy for average humans to beat the LLMs, and add some nice design, maybe Wordle-style daily challenge plus social sharing etc, I could see it going viral just as a way for people to "prove" that they are "smarter" than AI.

1 comments

Given the high scores, I guess it was an easy one. I've taken the longer one, and got the following

> You scored 28/100. The best language model, gpt-4, scored 32/100. The unigram model, which just picks the most common word without reading the prompt, scored 28/100.

Assuming complexity averages out on N=100, small test with LLM score above ~5 is "easy"