Hacker News new | ask | show | jobs
by isolli 1724 days ago
"But, unlike more than 200 human solvers, it wasn’t perfect on all of the puzzles: It got waylaid on two of them and finished with errors."

I'm curious, what kind of error are we talking about? Words that don't exist, or another solution to a problem that may not have a unique solution?

4 comments

I wonder how it does on puzzles where the answers need to be written in unusual ways? Some examples from the New York Times puzzles.

- There was one with a name that suggested an Alice in Wonderland connection, and it had an answer "THE LOOKING GLASS" (no spaces) running vertically down the full length of the center of the grid.

Every across answer that was entirely to the left of that was written normally. Every across answer entirely to the right was written backwards. Every across answer that crossed the center was a palindrome centered on the center.

- There was one where several answers were triple Spoonerisms of well known phrases.

For example, the answer "THE STUCK HOPS BEER" for the clue "Tagline in an ad for Elmer's Glue-Ale". Rotate the ST from STUCK, the H from HOPS, and the B from BEER and you get "THE BUCK STOPS HERE".

- I remember one that had a few isolated black squares, and a theme that suggested the puzzle had something to do with roundabouts.

Those black squares were roundabouts. Answers would hit the roundabout and continue after a 90 degree turn.

- I remember one where the theme was something like "What goes up must come down". That had several answers that, like the roundabout one above, would take a 90 degree turn from across but it was a left turn so after the turn they went up. Then there would be a down answer whose start was the reversed end of that across answer that would come down, also make a left tern, and continue across to the right.

Those are great. Another memorable NYT one had the theme A SHOT IN THE DARK, and several answers ended in SHOT (RIMSHOT, EARSHOT, etc.), but the grid only had space for the first part of the word, so the actual answers were RIM⬛, EAR⬛, etc. Another had a RISE FROM THE ASHES theme, and had several answers that ended in ASH that did not seem to fit the clue: the actual answer followed vertically where the ASH ending started. E.g., for the clue "Tell it like it is", the answer that fit straight across was "TALKS TRASH", but above the "A", going up, were I, G, H, and T, giving the actual answer "TALK STRAIGHT".
Non-existent words should be easy to check for. My guess is answers not (quite) matching the clues. Analogies and cultural references are probably hard for AI to get: is an "empire fighting knight" historical figure or Jedi? One level of such referencing is doable, 2-3 likely very hard.
> Non-existent words should be easy to check for.

I'm interested in your magical solution because this is a hard problem.

You should have given more detail on why this problem is deceptively hard. I am guessing that the simple solution of looking up the word in the dictionary seems to work ok (especially in the context of an artificial competition, which doesn't have to accept uncommon spellings, words in other languages etc), but still breaks down hard because of proper names, which are common in cross-words.
I think the point is that "Non-existent words should be easy to check for" is a useless metric, because plenty of answers are non-existent words. Proper names and things like that, many of which the computer may not have in its database.

So it can't simply blindly reject answers that are non-existent words.

An example of how this computer could make a mistake: Sometimes two proper names are crossing at a vowel. If you don't know either name, you sometimes have to guess blindly at the answer. (This exact scenario is rare in crosswords like the Times, but do occasionally come up, and similar scenarios exist.)

I would definitely be interested in knowing why it is hard. And yes, I was thinking about lookup: for example from (1) a dictionary, (2) a set of proper names and (3) a set of previous crossword answers. While nothing is perfect, but this (from an armchair) seems like it should work pretty well. And I am not proposing it as the main part of the algorithm, just a check on non-words.
There is no dictionary that contains "all existing words" for any giving language.
It depends a lot on what you mean by a language. Iglf you define 'the English language' as 'all of the words that some amount of people who identify as speaking English would understand', then of course there is no dictionary that would cover that (but by this definition, many words in the English language are Indian, Chinese, Romanian, Russian, etc, and would be completely incomprehensible to the vast majority of people in the USA or England). On the other hand, many people define the concept of a 'correct English word' as 'any word with a definition in the OED or Merriam Webster (ignoring proper nouns)', and leave other words as being 'wrong/foreign language'.

Either way, this is all moot when discussing a crossword puzzle contest, which explicitly limits itself to words in a specific dictionary + proper nouns. The problem of proper nouns is still extreme, and brings down the whole idea, but at least the problem of recognizing 'all possible common nouns that could be present in the crossword contest' is simple.

A similar thing happens even with human solvers from time to time. It's possible to solve a puzzle with answers that match the clue but aren't what was intended. Of course, the more you let yourself deviate from the clue, the more possibilities exist.

My favorite example of using this is the 1996-11-05, the day of the presidential election, NY Times crossword with the clue "Lead story in tomorrow's newspaper (!), with 43-Across". The puzzle worked with both "CLINTON" or "BOBDOLE" in the crossword.

Also, as a novice to crosswords, how do puzzle makers ensure that they have a unique solution?
Related, this crossword designed to have two solutions: https://imgur.com/Atd5Dry
In a traditional cryptic crossword each individual word has at least two independent things in the clue referring to it, for example, in an unrealistically simple case, "it means X" and "it's an anagram of Y", so each individual answer is very likely to be unique. In addition, alternate letters of each answer are part of another word, so the whole thing is extremely likely to have a unique solution.

However, with a combination of a bad clue and a bit of bad luck it does sometimes happen that an experienced crossword solver has to take a guess before sending in their solution. I'm not an experienced crossword solver, but I've seen them at work, and I would guess that finding a good alternative solution happens less than one time in 50. That's just a guess, though. For a better estimate one should systematically compare the solutions produced by competent crossword solvers with the official solutions.

In a crossword puzzle, each word shares letters with at least two other words, likely many more. So, without any effort from the puzzle desogner, it is very unlikely to actually have all of the words have ambiguous enough solutions that match the letter count etc. Even a crossword where half the clues are 'any word' will have a good chance of having a unique solution overall.

Of course, people don't like to play crosswords like sudoku, blindly matching letters, so there does have to be some skill to the clue design to have a relatively small amount of plausible answers, if not being entirely unique.

They don’t necessarily. Some times this is done on purpose at harder difficulties, the clueing is intentionally vague to lead you down blind alleys.