Hacker News new | ask | show | jobs
by e12e 677 days ago

    you: 8/15
    gpt-4o: 2/15
    gpt-4: 4/15
    gpt-4o-mini: 4/15
    llama-2-7b: 5/15
    llama-3-8b: 5/15
    mistral-7b: 6/15
    unigram: 5/15
> You scored 8/15. The best language model, mistral-7b, scored 6/15. The unigram model, which just picks the most common word without reading the prompt, scored 5/15.

(In I think 120 seconds - didn't copy that part).

Interesting that results differ this much between runs (for the LLMs).

Surely someone did better than me on their first run?

Ed: I wonder if the human scores correlate with age of hn account?