Hacker News new | ask | show | jobs
by wchargin 2352 days ago
Hey, I wrote one of these, too! Mine generates and solves all possible puzzles (with my dictionary, there are 54733) in about 1.8 seconds on my six-year-old laptop, and can also typeset them to LaTeX+TikZ to make printable puzzle sheets. That’s 0.03 milliseconds per puzzle. It turns out that you can get this thing to run quite fast once you realize that you can pack words into bitsets and express the whole algorithm in terms of bitwise operations on machine words. Of particular interest is that the running time is output-sensitive: proportional to the number of solutions to the puzzle but not the total size of the dictionary (after one-time preprocessing).

I used the “hard copy” rule set as used in the printed New York Times, which is slightly different from the web rule set: words must be at least five letters long, not four, and are worth 1 point each, or 3 for a pangram. The hard copy puzzles also include three score thresholds, so I had a bit of fun trying to reverse-engineer how those thresholds are chosen. I didn’t get exactly the right function, but I got fairly close, and most importantly the thresholds feel fair when playing. (The online version also has score thresholds, but there are many more of them, and it was easy for me to transcribe thresholds from the archives of the printed copies.)

In my admittedly biased yet assuredly humble opinion, both the algorithm performance and the rating threshold estimation are interesting: https://github.com/wchargin/spelling-bee/tree/master#perform...

Oh, and a web-based solver, for convenience: https://wchargin.github.io/spelling-bee/

Lovely to see how the author of this article and I both had a lot of fun with this by taking it in different directions. :-)

2 comments

Very nice! I also took a bitwise approach and ended up with something that solved all possible puzzles in 5 seconds (in 60 lines of matlab). A technique I used that helped bring the time down was to calculate the points for a given puzzle irregardless of the center letter. This provides an upper bound for the points that could come from that puzzle. After you've done that for all the boards, you sort by upper bound and then start calculating the actual scores for the different center letter options. Doing this for the 538 puzzle, I only had to look at center letters for the top 2 puzzles 'aeginrt' and 'adeinrt' (after those you find a score that is greater than the upper bound for any of the other puzzles).
Its amazing the speedups if you have the right model for the problem!
Very cool. I also wrote a solver using the NYT magazine's rules back in 2018, and found the bitwise method of matching both 1 and 3 point values incredibly fast. I just precalculated the hashes and ran a parallel loop over the whole array, but I know there are specialized data structures that games like Scrabble use. I wonder if that would have sped things up even more.

Anyway, whenever I see a problem that involves letters now, I reach for bit operators first.