|
|
|
|
|
by zoklet-enjoyer
677 days ago
|
|
You scored 6/15. The best language model, gpt-4o, scored 6/15. The unigram model, which just picks the most common word without reading the prompt, scored 2/15. Keep in mind that you took 204 seconds to answer the questions, whereas the slowest language model was llama-3-8b taking only 10 seconds! |
|
(In I think 120 seconds - didn't copy that part).
Interesting that results differ this much between runs (for the LLMs).
Surely someone did better than me on their first run?
Ed: I wonder if the human scores correlate with age of hn account?