Hacker News new | ask | show | jobs
by lolinder 517 days ago
A small weakness in this test is that one of the keys to strategic Codenames play is understanding your partner. You're not just trying to connect the words, you're trying to connect them in a way that will be obvious to your partner. As a computing analogy: you're trying to serialize a few cards in a way that will be deserializable by the other player.

This test pairs o1 with itself, which means the serializer is the deserializer. So while it's impressive that it can link 4 words, most humans could also easily link 4 with as much stretching! We just don't tend to because we can't guarantee that the other human will make the same connections we did.

4 comments

lol I played this game with my family and they said my wife and I were cheating because I kept using inside jokes that made no sense to them but she would get immediately.
That's a big part of what makes this game enjoyable - a clue that is very obvious to one person might not even cross the mind of someone else. To anyone reading this who hasn't played, it's definitely worth giving it a try.
Agreed, big fan of codenames in general but it plays its best when you’re playing against / alongside people that you’ve known for a while. The metagaming aspect of structuring clues to who your partner is really takes it to the next level.
Same for Taboo for me. It's why we married.
Stretching? Never! I see your 4-clue, o1, and raise you “QUEUE” for 5:

  - Line (Standing in the queue…)
  - London (they’re all queued up, innit?)
  - Log (*backend distsys handwaving*)
  - Mail (what do you think an inbox is, anyway?!)
  - Round (homophone “Q” is a typographically round letter)
I think Round may be invalid but in any case I would not have gotten it.
thanks for the comment. I actually tried explicitly mentioning in the prompt that 'Your guesser follows the same reasoning process'. But this did not make any clear improvements. Maybe I should've done more prompt engineering.
Nah, prompt engineering wouldn't have solved the fundamental issue, which is that the associations between ideas as stored in the weights will be the same between the two AI players, which makes it an easier game for them than for a human equivalent. It'd be like two copies of you playing on a team, having shared all the same experiences right up until the moment the game starts.

And don't get me wrong, it's still a fun experiment! It's just that that 4 would never have worked if a human played against another human—there are simply too many other words that would be equally strongly associated:

* Gum: Gum is often wrapped in paper, so 'GUM' is strongly associated with the word 'PAPER'.

* King: King is a type of face card, which are printed on paper, so 'KING' is strongly associated with the word 'PAPER'. (Repeat for JACK.)

* Light: Paper is a lightweight material.

That's 4 others right there that are at least as closely connected in my head as LAWYER or LOG. The only reason why o1 pulled up the same four when guessing as it did when clueing is that it's the same model.

Again, I didn't mean this as a knock, just a warning about drawing too many conclusions from the test!

When I saw those 4 words I thought of "letter" or "writing". (But I likely wouldn't have thought of that cluster while scanning the full board.)

I think "paper" is a great clue, and those 4 words lawyer/mail/log/line match better than gum/king/light.

There's an even better reason for "lawyer/-paper" than chatgpt gave: lawyers "serve papers".

That we disagree on this is exactly why who you're playing with matters. I'd have never gotten to lawyer, certainly wouldn't have connected log. Line is a very faint possibility. Mail is the only one I'd have gotten for sure.
Ehhh I don’t think that’s accurate. The problem is not linking 4 words. It’s linking 4 words without accidentally triggering other, semantically adjacent words.

This task could probably be solved nearly just as well with old school word 2 vec embeddings

Right, that's what I meant to be getting at: when you connect 4 words with as much stretching as o1 did there, you're running a real risk that the other party connects a different set. Unless that other party is also you and has the same learned connections at top of mind.
> This task could probably be solved nearly just as well with old school word 2 vec embeddings

I've tried. This approach is well beyond awful.

I see a few papers published that did exactly this successfully. It also just sounds crazy that it wouldn’t work well.

It’s odd to me that you would confidently claim it’s “beyond awful”.

I'm confidently relaying my experience. But I get that I was extremely terse and overly general in my reply.

I haven't surveyed all the papers, although I have read some. And all the ones that I've seen that work okay -- do so by using a language graph or word association graph in their algorithm. Not just embeddings. Even then the results don't look good to me compared to human performance.

Why does it sound crazy that it wouldn't work well? Have you used word embeddings much? Maybe you have and have good reason to think this - I don't mean to imply otherwise. But it doesn't sound crazy to me that it wouldn't work well.

If I am wrong I would love to know it.