Hacker News new | ask | show | jobs
by Someone 5191 days ago
I do not think that is correct. Let's say that, at some time, you know the word has an E, with 99% probability at position 1 and with 1% at position 2. You also know there is 50% chance that the word has an O. I would guess the E, because it comes free, even though it is not the one that gives the most information.

You must somehow weigh information gain against risk.

2 comments

Assuming there is always a letter with a 50% chance and we choose wrong 11 times in a row. One more and game over. At this point we have 11 Bits of information, which is enough to distuingish between 2048 words.

Actually, we should be correct 50% of the time. Which means 22 Bits of information or 4194304 words.

Additionally, we know the length of the word.

The english dictionaries seem to have between 400k and 1000k words [0] of all word sizes. With 22 Bits we get 4000k words. We do not have to worry about getting hanged using the information-reduction algorithm. ;)

[0] http://hypertextbook.com/facts/2001/JohnnyLing.shtml

I have to correct myself. You do not want a 50% chance, because the answer does not provide 1Bit of information. If we get it right, we also know the position(s) of the letter.

More precisely: Given a dictionary of all possible words and a letter to guess, we can split the dictionary into k parts. One sub-dictionary contains all words for which the answer is negative. Additionally, we have k-1 sub-dictionaries for each equivalence class of letter positions.

For each letter, we can compute the partition. Now we need to choose the strategically best partition. I believe it should be the one with "the lowest average sub-dictionary size".

The simplest way I know of to do that is by a weighted linear combination. Both information gain and risk are quantifiable; one in bits and the other in probability (or log-odds which is also bits). Then the problem reduces to finding appropriate weights, which can be done with your favorite n-dimensional search algorithm.

It may also be useful to consider how many remaining incorrect guesses there are in the game.