Hacker News new | ask | show | jobs
by jameshart 1724 days ago
You might not need to codify the rules; you could create a tagged training set which includes additional information, like a ‘parts of speech’ breakdown of how a clue relates to an answer (anagramSignifier - anagramMaterial - association, etc.)

An unsupervised learner could probably do reasonably well at picking up on those patterns, even to the point of considering that a word it hasn’t seen used as an anagram signifier before might be doing that job in a particular clue.

On the other hand, machine translation learning has moved away from using tagged parts of speech as far as I’m aware, and it has nonetheless managed to develop sufficient internal modeling that it is as if it has learned parts-of-speech tagging; it’s possible that an ML on a cryptic clue corpus could develop those same hidden models.

1 comments

That's still codifying the rules - doing it through the data rather than in preprocessing