Hacker News new | ask | show | jobs
by JoelEinbinder 677 days ago
I made a little game/quiz where you try to guess the next word in a bunch of Hacker News comments and compete against various language models. I used llama2 to generate three alternative completions for each comment creating a multiple choice question. For the local language models that you are competing against, I consider them having picked the answer with the lowest total perplexity of prompt + answer. I am able to replicate this behavior with the OpenAI models by setting a logit_bias that limits the llm to pick only one of the allowed answers. I tried just giving the full multiple choice question as a prompt and having it pick an answer, but that led to really poor results. So I'm not able to compare with Claude or any online LLMs that don't have logit_bias.

I wouldn't call the quiz fun exactly. After playing with it a lot I think I've been able to consistently get above 50% of questions right. I have slowed down a lot answering each question, which I think LLMs have trouble doing.

2 comments

"This exercise helped me to understand how language models work on a much deeper level."

I'd like to hear more on this.

It's an interesting test, pretty cool idea. Thanks for sharing