Hacker News new | ask | show | jobs
by TeMPOraL 3692 days ago
> Humans do a remarkable job of dealing with ambiguity, almost to the point where the problem is unnoticeable; the challenge is for computers to do the same. Multiple ambiguities such as these in longer sentences conspire to give a combinatorial explosion in the number of possible structures for a sentence.

Isn't the core observation about natural language that humans don't parse it at all? Grammar is a secondary, derived construct that we use to give language some stability; I doubt anyone reading "Alice drove down the street in her car" actually parsed the grammatical structure of that sentence, either explicitly or implicitly.

Anyway, some impressive results here.

4 comments

Various syntactic theories (HPSG, GPSG, minimalism, construction grammars) from linguistics are certainly derived constructs, but most researchers would agree that they all reflect real abstractions that humans make. I think the NLP community has good a job of harvesting the substantive aspects (which tend to be fairly conventionalized upon across theories) without overfitting on specific cases. "Alice drove down the street in her car" is easy for people to process, "The horse raced past the barn fell" is not, because it requires a pretty drastic reinterpretation of the structure when you get to the last word.

That said, there is some interesting work on "good-enough" language processing, which suggests that people maintain some fuzziness and don't fully resolve the structure when they don't need to. [1]

[1] http://csjarchive.cogsci.rpi.edu/proceedings/2009/papers/75/...

but most researchers would agree that they all reflect real abstractions that humans make

They reflect a particular language in its well-written form. However, humans are extremely robust against syntax errors. I am not a linguist, but I think this speaks in favor of lexicalist approaches: we can be very free in word order, as long as our brain can match up e.g. verbs with their expected arguments.

No, the academic consensus is pretty much the opposite. For example by trying to rigorously state the way we form yes/no sentences in english - the process that converts "the man who has written the book will be followed" to "will the man who has written the book be followed?" instead of the incorrect "has the man who written the book will be followed?" - you will find that the rules must involve imposing some sort of tree structure on the original sentence. The fact that we do it correctly all of the time on sentences we've never seen before means that we must have parsed the original sentence.

(Example sentences taken from https://he.palgrave.com/page/detail/syntactic-theory-geoffre..., although any introductory linguistics/syntax textbooks will spend a few pages making the case that humans understand language by first parsing it into some kind of tree structure).

> the process that converts "the man who has written the book will be followed" to "will the man who has written the book be followed?" instead of the incorrect "has the man who written the book will be followed?"

And yet the following is also correct - in terms of real-world usage, not some prescriptive definitions:

"The man who has written the book will be followed, right mate?"

> you will find that the rules must involve imposing some sort of tree structure on the original sentence.

The rules are, and the brain may be, but I feel those are different tree structures. Moreover, I wonder if the "tree structures" of our brains aren't just artifacts of recursive pattern matching - we also know that when reading, humans process whole groups of words at a time, and only if there's some mismatch they process pieces in more detail. Any recursive process like this will generate a tree structure as its side effect.

Anyway, thanks for the examples. I might pick a linguistic book at some point. Right now the idea of understanding natural language by parsing it into "NOUN PHRASES" and "VERB PHRASES" and stuff seems completely backwards, given how humans have no trouble parsing "invalid" sentences, or using them - especially in spoken language.

(Not to mention our ability to evolve the language, and how the grammatically invalid constructs tend to be introduced, used, understood with no trouble and at some point they become grammatically accepted - see e.g. recent acceptance of "because <noun>").

Yes, of course your "right mate" example is also grammatically correct. The point is that people routinely and naturally do the complicated transformation to "will the man who has written the book be followed?", and that transformation can't be done by simple pattern matching. Hence, humans who are able to do the complicated transformation must be mentally parsing the sentence. The fact that there is an alternative simple transformation to form the yes-no question is irrelevant because the ability to use the complicated transformation still exists.

> given how humans have no trouble parsing "invalid" sentences

I think you misunderstand slightly - the claim linguists make is not "humans are unable to understand invalid sentences because they can't parse them", the claim is that when you see an invalid (cannot be parsed into a proper tree) sentence, you have a gut feeling that it "sounds off", and if you're a native speaker you would never accidentally produce such ill-formed sentences. You can still understand the meaning of a sentence like "I this morning fish eat" but you also immediately notice that it's "off" - and that's the phenomena that syntax tries to explain.

Furthermore, the way you understand sentences like "I this morning fish eat" is different from the way you understand "I ate fish this morning", in the former it feels like you're guessing. It could work for communicating simple thoughts, but I doubt an english non-speaker who has an english dictionary could convey a complicated thought requiring many words by that same guessing process. In fact the reason why language evolved tree syntax is probably because it is needed to convey long, complicated thoughts.

> because <noun>

I'm glad you mentioned that! First, modern linguistics is very far from prescriptive. In fact the first thing they teach you (at around the same time they make the claim that "humans parse sentences into tree structure") is that linguistics is a descriptive field - language changes over time, the study of the rules of language and how these rules change is interesting and important, but it's pointless to "enforce" the rules. Even new constructions like "because X" have rules that govern them, eg see http://allthingslinguistic.com/post/72252671648/why-the-new-... - constructions like "because want" and "because need" exist, but no one says "because adore", and something interesting explains why. (to be fair, I haven't really internalized the "because X" construction so I can't claim that I find "because adore" unnatural, but the article says it's the same reason why "omg want" and omg need" are currently grammatical but "omg adore" is not, and even if you're not familiar with the "omg X" construction, it gives independent evidence in that "omg adore" has no tumblr tags; of course, it may become grammatical in the future, but that would be because the rules have changed over time, not because there are no rules). To that point,

> or using [ill-formed sentences] - especially in spoken language.

actually, if a sentence is used in spoken language routinely and non-accidentally, linguists take it as evidence that it's grammatical and then work backwards to find the rules that explain why it is so. How else could they do it?

Thanks for your answers. You've raised a lot of good points, and I need to think them through.

> the claim is that when you see an invalid (cannot be parsed into a proper tree) sentence, you have a gut feeling that it "sounds off", and if you're a native speaker you would never accidentally produce such ill-formed sentences. You can still understand the meaning of a sentence like "I this morning fish eat" but you also immediately notice that it's "off" - and that's the phenomena that syntax tries to explain.

I see. Yeah, most of the way I think about how mind processes language comes from focusing on that "gut feeling", that on one hand tells you that this perfectly understandable sentence is somehow "off", and on the other hand lets you form perfect sentences without ever explicitly thinking about grammar.

> First, modern linguistics is very far from prescriptive. In fact the first thing they teach you (at around the same time they make the claim that "humans parse sentences into tree structure") is that linguistics is a descriptive field

It seems to me that I've been operating under invalid assumption that linguistics is mostly prescriptive. Thanks for that. Any recommendation for an intro book I could grab to read in my spare time?

> Any recommendation for an intro book I could grab to read in my spare time?

Unfortunately I think the field suffers from a lack of such books.

1. You could try Steven Pinker's "The Language Instinct", although it's a general-audience book that doesn't really try to teach you linguistics proper

2. The first textbook I used was https://linguistics.osu.edu/research/pubs/lang-files and it's pretty good. However, it's quite hard to obtain.

Edit:

3. If you just want to look at syntax http://web.mit.edu/norvin/www/24.902/24902.html is advanced but good

I'm not sure about the claim on implicit lack of parsing structure. I read your example as who did what, where, in what. There must be some level of structural parsing and recognition so we understand it was Alice who drove in a car, that the car is owned by Alice, and that she, Alice, drove down the street, in her car. That we automatically understand all this seems to indicate some level of implicit parsing, right? Admittedly, it's been many years since I did any study of linguistics and language acquisition, so I'm pretty ignorant of the current state of knowledge here. Am I just layering my grammatical parsing atop an existing understanding that doesn't parse at all?
I think observing how children learn their native language is pretty informative. They can speak and understand it very well, whether or not they were taught formal grammar at school. Personally, I know very, very little of Polish grammar (i.e. of my native language), and only little bit more of English grammar - and that is only because foreign language courses are pretty heavily grammar-laden.

I'm not a linguist, but seeing how people a) can understand sentences that are grammatically malformed perfectly well, b) can easily derive meaning out of "sentences" stripped out of verbs ("I her dinner cinema Washington"), it seems to me that most of the work is being done by pattern-matching to known words and phrases. E.g. "drove down the street" is a kind of semantic unit on its own.

Again, I'm not a linguist, but a lot of introspection as well as observing other people strongly suggest to me that humans do anything but parsing grammatical structures.

It's precisely how strongly we conform to grammar, without having been taught it, which shows that it's key to our internal representations/to how we learn language.

Here is the undeniable proof that syntactic structure exists. Consider the sentence `The magician pointed at the man with the hat.' This is a perfectly natural sentence, of which there are two likely interpretations. One is that the magician used a hat to point at the man. The other is that the man who was pointed at wore a hat.

What distinguishes these sentences? Only the underlying syntactic structure, of whether to parse it as `the magician pointed at (the man) with the hat' or as `the magician pointed at (the man with the hat)'. This `hierarchical structure' of our sentences is syntactic structure at its essence.

You argue that humans can understand sentences with whatever grammar, and parsing is pretty much pattern-matching of words. But what about the sentence pair : `Benny chased Jenny' versus `Jenny chased Benny'? These have the same words, and mean different things. It is only our syntactic understanding of how words are ordered in English that allows us to understand these sentences.

Here is the undeniable proof that syntactic structure exists.

There are multiple hypothesis of what a sequence of words can mean, which is not the same thing as 'we form explicit syntax trees in our heads when reading a sentence'.

I could also give you the bag of words

magician point man hat

You would derive meaning from this bag of words, probably the same interpretations as in your example. However, the sentence is utterly ungrammatical. Note that I am not contending that we don't use some form of syntax at all. E.g., I think that someone whose native language has a freer word order than English will assign more hypotheses to the bag of words above (e.g., my brain also considers the less likely option that the magician is the object).

Another problematic aspect of this hypothesis is that a longer sentence will have so many possible parses that it would take a long time to construct and consider all parses. Moreover, I find it unlikely that we have thousands of exact syntax trees in our head that we compare.

> There are multiple hypothesis of what a sequence of words can mean, which is not the same thing as 'we form explicit syntax trees in our heads when reading a sentence'.

Yeah. I'm playing with a different idea now - maybe that "tree structure" that "undeniably exists" in our brains isn't an explicit syntax tree, but an artifact of recursive, adaptive pattern-matching? I.e. if you look at things like reading speed or "understanding" speed, you'll notice that people tend to process stuff in large blocks until something "does not click", and they have to focus and process the block in detail. That sort of feels like a recursive refinement, and any process that recurses in more than one place generates a tree structure as a side effect.

I'm not sure what you're implying. The fact that we are not consciously aware of parsing grammatical structures doesn't mean it doesn't happen.

For example we know for sure that the brain applies sophisticated mathematical algorithms to signals coming from the ears to locate sound in 3D space, yet we are certainly not consciously aware of it - we just "know" where the source is located

Regarding grammar, there is a theory called Universal Grammar from Chomsky that we are born with grammar structures in the brain.

Some recent news on it - http://www.medicaldaily.com/noam-chomskys-theory-universal-g...

Disclaimer: this very article was used by a linguistics professor of mine to show why not to trust the popular news reporting and look at the study.

It's a very good study, but does NOT prove `UG' once and for all.

Ah, sure. Those are excellent points. I wasn't really thinking about how we bridge grammatical incorrectness. For myself, perhaps because I'm a grammar nerd, I feel like I always parse someone's mistaken statements into their grammatically correct forms. But I can recognize doing that after I've already figured out what they were intending to say. Same happened with my kids. That's a helpful vector for thinking about the problem, for sure. Thanks!
> I doubt anyone reading "Alice drove down the street in her car" actually parsed the grammatical structure of that sentence, either explicitly or implicitly.

You do need to analyze a sentence to understand it. Think of a classical attachment ambiguity such as "the boy saw the girl with the telescope". There are two readings of the sentence, and just like a Gestalt, you're typically perceiving it as one or the other. This involves a process of disambiguation, which is evidence that you have parsed the sentence.