| HN Mirror

It's important to note that Winograd Schemas don't really test if the system understands those sentences, they essentially test the system has appropriate "common sense" knowledge/experience about how our world and society works, i.e., it tests whether the system understands whatever other data sources are usable to find out about this topic.

To give the proper answer in the example you use, a human (or a system) needs to know how such permits are issued and what are the common reasons for refusing such permits. As such, a sufficiently sophisticated pattern matching system is perfectly sufficient to answer such questions - there's a simple pattern difference that fearing violence causes you to refuse permits but advocating violence causes you to get refused. It's worth thinking about where do humans learn this? For the Winograd schemas like putting a trophy in suitcase, it's the basic childhood experience of putting stuff in boxes that we all share, but a machine won't (unless it's raised as a child-robot). For schemas like this one, it's understanding how our society works learned by participating in our society for years, which we all share, but a machine won't (unless we allow machines to participate in our society). I.e. it's not so much a measure of intelligence as a measure of shared background experiences. A human from a hunter-gatherer tribe wouldn't be able to answer the councilman-permit schema, but that doesn't mean he/she isn't intelligent.

The difficulty there is caused mainly by the need to have domain-specific knowledge in a wide range of domains - we will perceive systems as "dumb" unless they share the same background knowledge that most humans have gained by being part of our society and basic schooling, and since the machines won't do that (yet), we're looking for "unnatural" ways of getting common sense knowledge without the direct experimentation and participation that we do.