Hacker News new | ask | show | jobs
by JohnStrange 3294 days ago
I kind of disagree with some of the premises in the article. I've seen an HPSG for German in the late 90s that was able to parse almost any sentence I could throw at it correctly from a syntactic perspective.

The main problem for natural language understanding is not parsing and not even the semantic and pragmatic representations per se, it has always been the understanding. This requires an adequate knowledge representation and the drawing of inferences from it, and I don't believe that any substantial advances have been made in that field. Computational ontologies have grown larger and there are more "frameworks" than you can count, but none of them offer much knew and promising approaches like geometric meaning theories are in their infancy. Knowledge representation and, generally speaking, the problem of how to integrate different information sources in useful ways are essentially unsolved problems.

Just my 2 cents. Note that I'm talking about the principal problems, not about specific practical applications for which you can use the statistical sledgehammer to some extent.

4 comments

Recently Coecke comments on Gärdenfors geometric meaning in the context of his categorical semantics that I'm finding interesting, in arXiv:1608.01402. What I would welcome is a computational link relating that semantics and oldie semantic-network based ideas. For instance in arXiv:1706.00526 description logic based knowledge representation is cast in string diagrammatic, categorical terms, and that at least puts the meaning realm in the same mathy foot.
> geometric meaning theories

Apologies, I'm an outsider to the field, but what exactly are you referring to here ? The whole vector-space semantic embedding that was popularized by works like word2vec ?

Geometric meaning theories.. this sounds like intriguing stuff, could you please point to a decent primer text about it?
References for you and the other poster who asked:

Peter Gärdenfors: Conceptual Spaces - the Geometry of Thought. MIT Press 2000 (Paperback 2004).

It is very easy reading. The problems of geometric meaning theory are compositionality and quantification - how to get the expressivity of logical representations in addition to nearness measures, fuzziness and so on. There are some interesting approaches:

Martha Lewis & Jonathan Lawry: Hierarchical conceptual spaces for concept combination. Artificial Intelligence 237 (2016): 204-227.

Diederik Aerts, Liane Gabora, Sandro Sozzo: Concepts and their dynamics: a quantum-theoretic modeling of human thought. Topics in Cognitive Science 5 (4) (2013):737-772. [and other work by Aerts]

Aerts work is fascinating me personally, but it's unfortunately above my level of mathematical maturity. This is a general problem in this literature, maybe some solutions are already there but they also need to be sold in a way that allows linguists to understand and use the methods. Montague was lucky (well, not personally, of course), because he had scholars who were able to package his dense ideas in more verbose and easier to access textbooks.

Another short book worth reading in my opinion, though very programmatic in nature:

Jens Erik Fenstad: Grammar, Geometry, & Brain. CSLI Publications 2009.

Semantics is syntax.

All semantics has ever been about is not causing parse errors during the decoding step of the sentence, and the constraints imposed on that.

'Syntax' is usually confined to "low level" concerns, while 'semantics' to those above, but the distinction is arbitrary and artificial.

There is no meaning but usage.

What do you mean?
It doesn't matter what he meant, all that matters is how he said it; your question is meaningless, for after all there is no meaning but usage.

(If it isn't clear, this comment is snide to GP.)

It's also a very strawman version of what I said, to the point of being wildly inaccurate.
Oh, you used the religiously verified word of "strawman"!

What you said made little to no sense and had no backing. Yours was a perfect example of layman speculation without any basis. Nothing you said made any coherent sense, nor had any backing. They don't even deserve a response.

Your comments have repeatedly been violating the HN guidelines by being uncivil and/or unsubstantive and generally nasty. We ban accounts that do this, so please stop doing this, and instead post civilly and substantively (or not at all).

https://news.ycombinator.com/newsguidelines.html

https://news.ycombinator.com/newswelcome.html

That we artificially decompose the process of accepting a sentence as, eg, proper English into two phases: syntactic correctness and semantic correctness.

However, that distinction is arbitrary -- there is only the question of if the sentence is accepted by an agent (eg, person) as a well-formed sentence.

Any full accounting of the class of well-formed sentences must embed the semantic concerns; violating semantics is a syntax error (albeit, not usually a "first order" one). Similarly, even base syntax, such as the subject/verb/object distinction and ordering is carrying semantic information about word usage. The distinction between the two is non-existent: a full accounting of either must embed the other.

So semantics is syntax -- if you write a system of rules that only accepts valid sentences, then the rules will end up carrying the semantic structure of the language in them.

Ed:

I suppose I left out why this might matter --

In the quest to build an AI that understands semantics (ie, that "understands meaning"), we can bypass attacking that problem directly by training it (eg, a NN) on the full acceptance task (joined syntax and semantics -- classifying a sentence as proper English or not), and then truncate the network away from "low level" features (and perhaps at the other side, focused on 'yes' or 'no') to extract a network that has (most of) the (abstract) semantic structure embedded. We could then utilize these "middle level" features as a sort of rosetta stone, to train low level networks to embed content for them to understand, and high level networks to utilize their output on decisions to repurpose "understanding" across tasks.

I would argue that using things like Word2Vec (or the resulting vector space of words) is a similar idea.

1) Colorless green ideas sleep furiously.

2) Me gizmo.

One of the above sentences is valid English, the other one is meaningful. That is the syntax/semantics distinction.

The distinction is a bit fuzzy in places (for instance, inflectional morphology), but does exist.

I actually disagree with your assessment.

You contend that the top sentence is valid English; I disagree with that. It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use. Not being one that an English speaker would use makes it invalid English -- it's just a case where the first order approximation is wrong.

Similarly, if your syntax rules reject the second sentence, they're wrong -- since it's a sentence that English speakers can parse: the conclusion can only be that your syntax rules don't actually match the language you're trying to model.

I get the distinction that you're trying to point out with syntax/semantics, but you're ignoring my point: that divide is artificial and 'semantics' as you mean it is merely higher order syntax.

You haven't shown there's an inherent meaning to the difference (ie, that you haven't just drawn an arbitrary line in the sand), just that you can find examples that (naively) fall on different sides of it.

Language is not some inherent property of the Universe; it is an evolved behavior in humans. We can study how humans perform language; and when we do, we find the syntax/semantics distinction to be naturally occurring in humans. For instance, in my example, a native English speaker will find the second sentence "awkward" in a way that they do not for the first sentence. Similarly, a native English speaker will extract a clear meaning from the second sentence in a way that they would not from the first.

It is conceivable that there is some other language (eg. not natural human language) which does not have a syntax/semantics distinction, but that hypothetical language is not what linguistics studies.

>> It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use.

Well, an English speaker did use it.

Btw, who do you consider an "English speaker"? Do I count as an English speaker? My native language is Greek but I speak English as a foreign language. I often say things that a native English speaker wouldn't say- but they convey a meaning that I wish to express. Do these utterances count as things that "an English speaker would use", or not?

I say they do. English speakers can say anything they like. In fact, they do, everyday, and as they do their language changes along with that.

Human language seems to be a lot more flexible than what you give it credit for. Semantics being just some sort of higher-order syntax (which btw we just haven't found yet) would make for a much more limited language ability than what we currently have. We'd be restricted to only a finite set of forms and we could only say a finite number of things. Obviously, that's not the case.

My favorite example where you need semantics to get the syntax is "See the a are of I."

   +---+---+---+---+
   | x |   |   |   | I
   +---+---+---+---+
   |   |   |   |   | K
   +---+---+---+---+
     a   b   c   d
The map shows 8 ares, the a are of I is marked with a cross.
You guys seem confused: I'm not claiming that what you're calling semantics doesn't exist; I'm saying that it's merely convoluted syntactical rules, and calling it a different name is misleading.

It would be correct to say that the sentence in isolation is a parse error, but with the diagram, it's merely elaborate syntax.

My point isn't that there aren't higher order rules (and approximations) -- just that the division of those rules into a separate area of study is artificial.

It's not contentious to point out that chemistry is just an approximation of physics because the actual higher order rules are too complex to study directly -- but it seems to be to point the same out in semantics and syntax.

>> violating semantics is a syntax error

That's a very strange thing to say. The thing with human language is you can say anything you like, including things that make no sense at all and things that are syntactically incorrect. You can easily find examples of meaningless, syntactically correct sentences, like Jabberwocky ("All mimsy were the Borogoves and the mome raths outgrabe" etc). It's also easy to find examples of sensible sentences with incorrect structure (see twitter.com).

In fact, what is "incorrect syntax" keeps changing all the time, but we can still say the same things as we always could (plus a probably infinite many new things besides). If syntax was tied to meaning as tight as you say, we'd probably have only one or two languages and no dialects. Language would be a static, unchanging thing and we'd need no NLP, or translators etc.