|
|
|
|
|
by chmod775
677 days ago
|
|
you: 4/15
gpt-4o: 0/15
gpt-4: 1/15
gpt-4o-mini: 2/15
llama-2-7b: 2/15
llama-3-8b: 3/15
mistral-7b: 4/15
unigram: 1/15
Seems like none of us is really better than flipping a coin, so I'd wager that you cannot accurately predict the next word with the given information.If one could instead sort the answers by likelihood and got scored based on how high one ranked the correct answer, things would probably look better than random. Also I wonder how these LLMs were prompted. Were they just used to complete the text, or where they put in a "mood" where they would try to complete the text in the original author's voice? Obviously as as human I'd try to put myself in the author's head and emulate their way of speaking, whereas an LLM might just complete things in its default voice. |
|
The language models were prompted with the text + each candidate answer, and the one with the lowest perplexity was picked. I tried to avoid instruction tuned models wherever possible to avoid the "voice" problem.