Hacker News new | ask | show | jobs
by gizmo686 3289 days ago
1) Colorless green ideas sleep furiously.

2) Me gizmo.

One of the above sentences is valid English, the other one is meaningful. That is the syntax/semantics distinction.

The distinction is a bit fuzzy in places (for instance, inflectional morphology), but does exist.

2 comments

I actually disagree with your assessment.

You contend that the top sentence is valid English; I disagree with that. It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use. Not being one that an English speaker would use makes it invalid English -- it's just a case where the first order approximation is wrong.

Similarly, if your syntax rules reject the second sentence, they're wrong -- since it's a sentence that English speakers can parse: the conclusion can only be that your syntax rules don't actually match the language you're trying to model.

I get the distinction that you're trying to point out with syntax/semantics, but you're ignoring my point: that divide is artificial and 'semantics' as you mean it is merely higher order syntax.

You haven't shown there's an inherent meaning to the difference (ie, that you haven't just drawn an arbitrary line in the sand), just that you can find examples that (naively) fall on different sides of it.

Language is not some inherent property of the Universe; it is an evolved behavior in humans. We can study how humans perform language; and when we do, we find the syntax/semantics distinction to be naturally occurring in humans. For instance, in my example, a native English speaker will find the second sentence "awkward" in a way that they do not for the first sentence. Similarly, a native English speaker will extract a clear meaning from the second sentence in a way that they would not from the first.

It is conceivable that there is some other language (eg. not natural human language) which does not have a syntax/semantics distinction, but that hypothetical language is not what linguistics studies.

.....Which is an effect of the first satisfying first order approximations while failing higher order rules, while the latter is merely an unusual sentence and so requires more effort to parse because it falls off the "fast path". (It also arguably fails to encode embedded cultural messages present in word choice -- a second consideration for why it feels "awkward": it's valid English, but not my tribe's English.)

You haven't pointed out how semantics is anything but higher order syntax -- merely outlined the way in which higher order syntax interacts with our perception.

I agree that there's a difference between the two sentences -- I disagree that it's because they're different fields of study instead of different edge cases of the same underlying notion of parsing syntax. (I especially disagree that the way forward on teaching machines language involves that distinction.)

I would appreciate you referring me to references on the semantic/syntax divide being "natural", though.

>> It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use.

Well, an English speaker did use it.

Btw, who do you consider an "English speaker"? Do I count as an English speaker? My native language is Greek but I speak English as a foreign language. I often say things that a native English speaker wouldn't say- but they convey a meaning that I wish to express. Do these utterances count as things that "an English speaker would use", or not?

I say they do. English speakers can say anything they like. In fact, they do, everyday, and as they do their language changes along with that.

Human language seems to be a lot more flexible than what you give it credit for. Semantics being just some sort of higher-order syntax (which btw we just haven't found yet) would make for a much more limited language ability than what we currently have. We'd be restricted to only a finite set of forms and we could only say a finite number of things. Obviously, that's not the case.

There was an implied "in isolation" on that sentence -- what's proper with other clauses and sentences included isn't necessarily alone.

An English speaker used it as an example of a statement that would cause a parse error for most English speakers, and so it did. The speaker said it even caused such a reaction in them. I would argue that they weren't attempting to use English, but quasi-English in an attempt to communicate the boundaries of English to people who can parse English (which inherently has some ability to parse quasi-English).

I don't think it's useful to pretend "English" is a coherent class of parsing rules (either over time or over population) -- there's only a roughly similar set of parsers undergoing continuous memetic evolution, broken up into subsets that are more similar.

At the end of the day, English is as people who can parse some subset of it do -- and it might reach the point where it makes more sense to talk about English languages than an Enish language.

That being said, your last paragraph confuses me:

It's not obvious to me that we aren't restricted to a finite number of forms in language.

It's not clear to me why you think semantics being higher order syntax requires that it only be capable of finitely many forms.

(The rest of it seems dependent on those two conclusions.)

My favorite example where you need semantics to get the syntax is "See the a are of I."

   +---+---+---+---+
   | x |   |   |   | I
   +---+---+---+---+
   |   |   |   |   | K
   +---+---+---+---+
     a   b   c   d
The map shows 8 ares, the a are of I is marked with a cross.
You guys seem confused: I'm not claiming that what you're calling semantics doesn't exist; I'm saying that it's merely convoluted syntactical rules, and calling it a different name is misleading.

It would be correct to say that the sentence in isolation is a parse error, but with the diagram, it's merely elaborate syntax.

My point isn't that there aren't higher order rules (and approximations) -- just that the division of those rules into a separate area of study is artificial.

It's not contentious to point out that chemistry is just an approximation of physics because the actual higher order rules are too complex to study directly -- but it seems to be to point the same out in semantics and syntax.

>> You guys seem confused: I'm not claiming that what you're calling semantics doesn't exist; I'm saying that it's merely convoluted syntactical rules, and calling it a different name is misleading.

Syntax and semantics, or structure and meaning, are completely different things- in fact they are entirely unrelated and their only association is by arbitrary convention (we all agree that certain structures are associated with specific meaning).

This is why you can do translation, for example- where you're essentially taking the semantics out of one kind of syntax and putting it into another.

In NLP it's easy enough to reproduce the structure of a corpus- simple, unsmoothed n-grams will do that well enough already and with a little more statistical elbow grease you can train a model that reproduces your text very well and even generates new text that looks quite resonable. Except of course that it rarely makes any sense at all. To generate text that is both syntactically correct and makes sense you need a lot more than that and we haven't really managed to do that except for very short durations (a few words at a time).

I'm saying: in NLP we can deal with structure very well indeed, but meaning is still a long way off. If it was just a matter of "more syntax", we'd have solved all our problems a long time ago.

Translation is hard precisely because semantics is carried by syntax -- when the syntax is radically different, you can only approximate the higher order structures.

If semantics were actually distinct and carried by both languages, you could translate without that loss of subtle meaning.

I also completely disagree with your last sentence: semamtics could easily be syntax that has rules which are hard to compute.

NLP has trouble with long-range (or broad) effects, which are (some of) what I mean by higher order syntax.

Syntax is usually used to describe those rules of a language that are easy to compute; so easy that you don't have to understand the meaning (semantics) to do it. E.g. you can point to a missing semicolon in a C program without understanding what the program does.

Of course you can call the rules of well-formedness "higher-order syntax" in the sense that the computation required to decide it is of a higher order than syntax, but the distinction between syntax and semantics is by no means unnatural. It has been discovered independently several times; some ancient studies of the syntax of Sanskrit have survived to this day.