Oh, you used the religiously verified word of "strawman"!
What you said made little to no sense and had no backing. Yours was a perfect example of layman speculation without any basis. Nothing you said made any coherent sense, nor had any backing. They don't even deserve a response.
Your comments have repeatedly been violating the HN guidelines by being uncivil and/or unsubstantive and generally nasty. We ban accounts that do this, so please stop doing this, and instead post civilly and substantively (or not at all).
That we artificially decompose the process of accepting a sentence as, eg, proper English into two phases: syntactic correctness and semantic correctness.
However, that distinction is arbitrary -- there is only the question of if the sentence is accepted by an agent (eg, person) as a well-formed sentence.
Any full accounting of the class of well-formed sentences must embed the semantic concerns; violating semantics is a syntax error (albeit, not usually a "first order" one). Similarly, even base syntax, such as the subject/verb/object distinction and ordering is carrying semantic information about word usage. The distinction between the two is non-existent: a full accounting of either must embed the other.
So semantics is syntax -- if you write a system of rules that only accepts valid sentences, then the rules will end up carrying the semantic structure of the language in them.
Ed:
I suppose I left out why this might matter --
In the quest to build an AI that understands semantics (ie, that "understands meaning"), we can bypass attacking that problem directly by training it (eg, a NN) on the full acceptance task (joined syntax and semantics -- classifying a sentence as proper English or not), and then truncate the network away from "low level" features (and perhaps at the other side, focused on 'yes' or 'no') to extract a network that has (most of) the (abstract) semantic structure embedded. We could then utilize these "middle level" features as a sort of rosetta stone, to train low level networks to embed content for them to understand, and high level networks to utilize their output on decisions to repurpose "understanding" across tasks.
I would argue that using things like Word2Vec (or the resulting vector space of words) is a similar idea.
You contend that the top sentence is valid English; I disagree with that. It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use. Not being one that an English speaker would use makes it invalid English -- it's just a case where the first order approximation is wrong.
Similarly, if your syntax rules reject the second sentence, they're wrong -- since it's a sentence that English speakers can parse: the conclusion can only be that your syntax rules don't actually match the language you're trying to model.
I get the distinction that you're trying to point out with syntax/semantics, but you're ignoring my point: that divide is artificial and 'semantics' as you mean it is merely higher order syntax.
You haven't shown there's an inherent meaning to the difference (ie, that you haven't just drawn an arbitrary line in the sand), just that you can find examples that (naively) fall on different sides of it.
Language is not some inherent property of the Universe; it is an evolved behavior in humans. We can study how humans perform language; and when we do, we find the syntax/semantics distinction to be naturally occurring in humans. For instance, in my example, a native English speaker will find the second sentence "awkward" in a way that they do not for the first sentence. Similarly, a native English speaker will extract a clear meaning from the second sentence in a way that they would not from the first.
It is conceivable that there is some other language (eg. not natural human language) which does not have a syntax/semantics distinction, but that hypothetical language is not what linguistics studies.
.....Which is an effect of the first satisfying first order approximations while failing higher order rules, while the latter is merely an unusual sentence and so requires more effort to parse because it falls off the "fast path". (It also arguably fails to encode embedded cultural messages present in word choice -- a second consideration for why it feels "awkward": it's valid English, but not my tribe's English.)
You haven't pointed out how semantics is anything but higher order syntax -- merely outlined the way in which higher order syntax interacts with our perception.
I agree that there's a difference between the two sentences -- I disagree that it's because they're different fields of study instead of different edge cases of the same underlying notion of parsing syntax. (I especially disagree that the way forward on teaching machines language involves that distinction.)
I would appreciate you referring me to references on the semantic/syntax divide being "natural", though.
>> It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use.
Well, an English speaker did use it.
Btw, who do you consider an "English speaker"? Do I count as an English speaker? My native language is Greek but I speak English as a foreign language. I often say things that a native English speaker wouldn't say- but they convey a meaning that I wish to express. Do these utterances count as things that "an English speaker would use", or not?
I say they do. English speakers can say anything they like. In fact, they do, everyday, and as they do their language changes along with that.
Human language seems to be a lot more flexible than what you give it credit for. Semantics being just some sort of higher-order syntax (which btw we just haven't found yet) would make for a much more limited language ability than what we currently have. We'd be restricted to only a finite set of forms and we could only say a finite number of things. Obviously, that's not the case.
There was an implied "in isolation" on that sentence -- what's proper with other clauses and sentences included isn't necessarily alone.
An English speaker used it as an example of a statement that would cause a parse error for most English speakers, and so it did. The speaker said it even caused such a reaction in them. I would argue that they weren't attempting to use English, but quasi-English in an attempt to communicate the boundaries of English to people who can parse English (which inherently has some ability to parse quasi-English).
I don't think it's useful to pretend "English" is a coherent class of parsing rules (either over time or over population) -- there's only a roughly similar set of parsers undergoing continuous memetic evolution, broken up into subsets that are more similar.
At the end of the day, English is as people who can parse some subset of it do -- and it might reach the point where it makes more sense to talk about English languages than an Enish language.
That being said, your last paragraph confuses me:
It's not obvious to me that we aren't restricted to a finite number of forms in language.
It's not clear to me why you think semantics being higher order syntax requires that it only be capable of finitely many forms.
(The rest of it seems dependent on those two conclusions.)
You guys seem confused: I'm not claiming that what you're calling semantics doesn't exist; I'm saying that it's merely convoluted syntactical rules, and calling it a different name is misleading.
It would be correct to say that the sentence in isolation is a parse error, but with the diagram, it's merely elaborate syntax.
My point isn't that there aren't higher order rules (and approximations) -- just that the division of those rules into a separate area of study is artificial.
It's not contentious to point out that chemistry is just an approximation of physics because the actual higher order rules are too complex to study directly -- but it seems to be to point the same out in semantics and syntax.
>> You guys seem confused: I'm not claiming that what you're calling semantics doesn't exist; I'm saying that it's merely convoluted syntactical rules, and calling it a different name is misleading.
Syntax and semantics, or structure and meaning, are completely different things- in fact they are entirely unrelated and their only association is by arbitrary convention (we all agree that certain structures are associated with specific meaning).
This is why you can do translation, for example- where you're essentially taking the semantics out of one kind of syntax and putting it into another.
In NLP it's easy enough to reproduce the structure of a corpus- simple, unsmoothed n-grams will do that well enough already and with a little more statistical elbow grease you can train a model that reproduces your text very well and even generates new text that looks quite resonable. Except of course that it rarely makes any sense at all. To generate text that is both syntactically correct and makes sense you need a lot more than that and we haven't really managed to do that except for very short durations (a few words at a time).
I'm saying: in NLP we can deal with structure very well indeed, but meaning is still a long way off. If it was just a matter of "more syntax", we'd have solved all our problems a long time ago.
Translation is hard precisely because semantics is carried by syntax -- when the syntax is radically different, you can only approximate the higher order structures.
If semantics were actually distinct and carried by both languages, you could translate without that loss of subtle meaning.
I also completely disagree with your last sentence: semamtics could easily be syntax that has rules which are hard to compute.
NLP has trouble with long-range (or broad) effects, which are (some of) what I mean by higher order syntax.
That's a very strange thing to say. The thing with human language is you can say anything you like, including things that make no sense at all and things that are syntactically incorrect. You can easily find examples of meaningless, syntactically correct sentences, like Jabberwocky ("All mimsy were the Borogoves and the mome raths outgrabe" etc). It's also easy to find examples of sensible sentences with incorrect structure (see twitter.com).
In fact, what is "incorrect syntax" keeps changing all the time, but we can still say the same things as we always could (plus a probably infinite many new things besides). If syntax was tied to meaning as tight as you say, we'd probably have only one or two languages and no dialects. Language would be a static, unchanging thing and we'd need no NLP, or translators etc.
(If it isn't clear, this comment is snide to GP.)