Hacker News new | ask | show | jobs
by tsm 1306 days ago
I have no idea how they actually work, but if I were asked to design one:

• Choose some reasonable number of questions of the form "Is it ___"?. Let's say 256

• Come up with a list of objects, and for each one give it a 256-long bitvector encoding its answers to the questions

• Maintain a set (implemented as another bitvector) of the potential items. Figure out which question would divide the set in two most closely; ask that question.

I am the opposite of a hardware hacker or systems programmer, but it seems like this is algorithmically straightforward to implement with bit-twiddling.

3 comments

One problem with this explanation of how it might work is that it appears to know far more than 256 things - though I don't know how many. And even if it did know only that many, how do you construct the set of questions?

Actually programming the logic once you know it is the easy bit, it's constructing the dataset of answers and questions in the first place.

This approach allows for 2^256 different things, one for each possible yes/no bitstring.
Ah yes, apologies. Still, big job to construct the database of what things correspond to what answers to the questions - and to choose the questions.
> Figure out which question would divide the set in two most closely; ask that question.

This depends on the goal. One goal might be to answer all questions as quickly as possible, in which case partitioning the search space in two might be a good strategy. Another goal might be to have the best chance of guessing the item within 20 questions, in which case you will want to choose questions which maximise the % of the remaining search space you can uniqely identify with the remaining questions (perhaps weighted by popularity of that item).

I think you'd probably start with a dense matrix of questions x answers populated by humans. But I imagine there's a clever preprocessing step that's used to build the optimum tree of questions, avoiding getting caught in a local maximum and significantly reducing the data you'd have to store.