Hacker News new | ask | show | jobs
by Syzygies 1611 days ago

    5.291 soare
    5.294 roate
    5.299 raise
    5.311 raile
    5.311 reast
    5.321 slate
    5.342 crate
    5.342 salet
    5.345 irate
    5.346 trace
    5.356 arise
    5.360 orate
    5.370 stare
    5.382 carte
    5.390 raine
    5.400 caret
    5.402 ariel
    5.406 taler
    5.406 carle
    5.407 slane
Shown are the twenty best initial guesses using Claude Shannon's definition of information entropy. Each number is the expected number of yes/no questions needed to resolve the remaining uncertainty. Shannon is the "father of information theory", and this is the right measure.

One might recognize SOARE as identified elsewhere by a different measure. I am relieved that I give up very little by moving down to the first word I recognize on this list.

This takes about 20 minutes to code in Ruby, and four minutes to run. There's no point to using a more efficient language, or wasting part of an hour as I did searching for a clever way to score guess words.

In the 1960's my dad used entropy to program Jotto on Kodak's computers. It wasn't feasible then to evaluate every possible clue word, but he determined that one did well enough with a random subset.

6 comments

Maximising the information gained using Shannon's entropy is a very good strategy (assuming the goal is to minimise the expected number of guesses), however it is not necessarily optimal!

I have a counter example for a simplified version the game with the following rule changes:

1. The player is only told which letters in the guess are correct (i.e. they are not told about letters that are present but in a different location).

2. If the player knows there is only one possible solution, the player wins immediately (without having to explicitly guess that word).

3. The set of words that the player is allowed to guess may be disjoint from the set of possible solutions.

Here is the list of possible solutions:

    aaaa
    aaab
    aaba
    babb
    abaa
    bbab
    bbba
    bbbb
(There are 8 words. The 2nd, 3rd and 4th letters are the binary patterns of length 3, and the 1st letter is a carefully chosen "red herring".)

Here is the dictionary of words the player is allowed go guess:

    axxx
    xaxx
    xxax
    xxxa
(Each guess effectively lets the player query a single letter of the solution.)

The information gain for each possible initial guess is identical (all guesses result in a 4-4 split), so a strategy based on information gain would have to make an arbitrary choice.

If the initial guess is axxx (the "red herring"), the expected number of guesses is 3.25.

But a better strategy is to guess xaxx (then guess xxax and xxxa). The expected number of guesses is then 3.

(In this example information gain was tied, but I have a larger example where the information gain for the "red herring" is greater than the information gain for the optimal first guess.)

Interesting. There can't be a proof of general optimality for Shannon entropy, because the words are irregularly distributed. However, (unlike your lists) they're not distributed by an adversary trying to foil Wordle/Jotto strategies.

I suspect a law of large numbers / central limit theorem type result that Shannon entropy is asymptotically optimal for randomly chosen lists, even those generated by state machines like gibberish generators that nearly output English words. In other words, I conjecture that your configurations are rare for long lists.

Early in my career, I was naive enough to code up Grobner bases with a friend, to tackle problems in algebraic geometry. I didn't yet know that computer scientists at MIT had tried random equations with horrid running times, and other computer scientists at MIT had established special cases with exponential space complete complexity. Our first theorem explained why algebraic geometers were lucky here. This is a trichotomy one often sees: "Good reason for asking" / "Monkeys at a keyboard" / "Troublemakers at a demo".

Languages evolve like coding theory, attempting a Hamming distance between words to enhance intelligibility. It could well be that the Wordle dictionary behaves quasirandomly, more uniformly spaced that a true random dictionary, so Shannon entropy behaves better than expected.

For the entropy strategy, the expected number of guesses on the full dictionary (allowing all 12972 words as possible solutions) is 4.233 (54910/12972)* according to my tests.

The best score on https://botfights.io/game/wordle is currently 4.044 (4044/1000).

*Spoiler: The score of the entropy strategy can be improved to 4.086 (53007/12972) by tweaking the entropy formula as follows: Let n be the number of possible solutions and let ki be the size of the i-th partition after the current guess. The usual entropy formula is sum{pi * -log(pi)} where pi = ki / n. The improved formula is sum{pi * -log((ki+1) / n)}. This formula aligns more closely with the expected number of guesses required to solve small partitions.

A fork using primes: Primel

https://converged.yt/primel/

The best first guess using Shannon entropy is a dead heat between 12953 and 12539. Not surprisingly to the number theorists out there...

If you study the JavaScript, apparently this guy's girlfriend knows every prime.

I don't think this list is accurate, because the expected number of guesses for the top words should be fewer than 5. I'll post something in more detail later, but the best guesses on the 12,972 5-letter words have an expected number of guesses of around 4.3.
It's a units issue. I'm measuring in yes/no questions, which is the native scale for Shannon entropy. You're measuring in Wordle questions. There's more information in a Wordle answer than in a yes/no answer.
> wasting part of an hour as I did searching for a clever way to score guess words.

I'd love to see scoring a Wordle guess added to one of those programming problem sites, as code golf in any language would be amusing. I was once a commercial APL programmer, so APL comes to mind.

Would you mind sharing your code?
Which word list did you use?
I believe tares is the optimal word (according to shannon entropy) from the game’s list.
They made a mistake. There are 2,315 relatively familiar words that can be the secret word, and 12,972 words in all that are accepted as guesses.

Using the full list as potential guess words, TARES is optimal for a random secret word from the full list, and SOARE is optimal for a random secret word from the short list.

Neither of these words appear in the short list. If one sticks to the short list for both guesses and secrets, RAISE is optimal.

Would you mind (or anyone) mind sharing github code to the Shannon method? I'd simply like to learn from it
Here is someone's write-up (with code) using the entropy method: https://jluebeck.github.io/posts/WordleSolver
The game has two lists -- the guessable words list, and there's also a subset which is the possible answers list (this is also provided to the players, I think maybe in order, even -- it is just for fun after all, no need to prevent cheating). IMO it is better to stick to the guessable words list.