Hacker News new | ask | show | jobs
by Imnimo 1611 days ago
I think there are two natural optimization criteria.

One is to try to maximize the expected information gain of your guess. If we have an initial set of 128 possible words, we begin with log2(128) = 7 bits of entropy. When we make a guess and receive a response, we narrow down the list of words to the set compatible with that response. If, for example, there are 32 words compatible with our response, then we now have log2(32) = 5 bits of entropy, and our guess was worth 2 bits. For a given guess, there are many possible replies each with its own information gain - in the best case, we get all greens, and are left with 0 bits of uncertainty (for a gain of 7), while in other cases we may get all greys, and learn comparatively little. Further, each reply has its own probability of occurring - all greens is only 1/128, but other replies might be more likely if there are several possible targets that would generate the same reply. Thus, we weight the information gain of each reply by its probability to arrive at the expected information gain for that guess. For the word list provided in the article, I get TARES as the best first guess by this metric.

The second strategy is to continue down the game tree, and find the guess with the lowest number of expected (or alternatively the lowest worst-case) number of guesses. In principle, we might find that while TARES gains us a lot of information as a first guess, it leaves us without a good second guess (since we are restricted to guessing real words, and those words might all be redundant in some way), and thus our total expected number of guesses is larger than if we had taken a slightly less informative first guess. My hunch is that in practice this sort of situation is unlikely to occur, and the best first guess by this metric is probably similar to that of the first metric, although I haven't tried.