Hacker News new | ask | show | jobs
by zoklet-enjoyer 677 days ago
You scored 6/15. The best language model, gpt-4o, scored 6/15. The unigram model, which just picks the most common word without reading the prompt, scored 2/15.

Keep in mind that you took 204 seconds to answer the questions, whereas the slowest language model was llama-3-8b taking only 10 seconds!

1 comments

    you: 8/15
    gpt-4o: 2/15
    gpt-4: 4/15
    gpt-4o-mini: 4/15
    llama-2-7b: 5/15
    llama-3-8b: 5/15
    mistral-7b: 6/15
    unigram: 5/15
> You scored 8/15. The best language model, mistral-7b, scored 6/15. The unigram model, which just picks the most common word without reading the prompt, scored 5/15.

(In I think 120 seconds - didn't copy that part).

Interesting that results differ this much between runs (for the LLMs).

Surely someone did better than me on their first run?

Ed: I wonder if the human scores correlate with age of hn account?