Hacker News new | ask | show | jobs
by JCraig 4903 days ago
I was thrown off by Might stating that "a grammar defines a language," which is not nearly as useful or factual as saying that it "describes" a language, the wording that he relies on throughout the rest of the article. That is the difference between me being able to make a dog or identify a dog based on a set of characteristics.

Grammars are only one part of understanding a language, hardly the "language of languages". In natural languages, grammars are one subset of linguistics. It would be just as valid to say vocabularies or phonology are the language of languages as it would be to say grammars are.

Other than these overly broad arguments and attempts to define natural languages in the same way that formal languages can be defined, this is a nice general introduction to some specific notation techniques for computer languages.

Of course, I might not have read it at all if it were titled "An Introduction to Backus-Naur Form, Extendend BNF, and Augmented BNF Notation Techniques".

3 comments

A "language" in this context, is a set of strings. (a string is a sequence of symbols, e.g. a program). A grammar defines a language. For example, the regular grammar

  (a|b)(x|y)
defines the language

  {"ax", "ay", "bx", "by"}
Unfortunately, the term "language" has other meanings. There's human languages, like English. There's also programming languages, like lisp, python, java. And markup languages like HTML and XML. And other computer-related non-programming languages.

While it's true that these other languages have more to them than their syntax, they do define a "language" in the above initial sense: the set of all valid instances of it (i.e. without syntax errors), the set of sequences of symbols.

Programming languages generally include ways of extending their language (in the initial sense). Even java: a java program includes a syntax for extending its syntax (its "language"), in the sense that a program using a certain method invocation becomes valid, if that method is defined. Thus, it is itself both definitions of a grammar, and instances within that grammar - like XML and XSD combined in one (or XML and DTD).

BTW: this reply (and the two similar ones) will probably annoy you, because you know what a "formal language" is (at least, you use the term). I think your misinterpretation is that the article does not claim anything about "natural languages" - only the shape/structure of a language ("So, what shapes languages? Grammars do."/"Behind every language, there is a grammar that determines its structure.").

To be fair though, it then jumps straight into "A grammar defines a language.", without noting a shift in the meaning of the term "language". I think its meaning is clear from context, but it's certainly misleading to shift terminology as you go along!

In formal language theory, '[formal] language' and '[formal] grammar' are well-defined mathematical terms, and it's indeed appropriate to say that a grammar defines language in that context.

Similarly, the 'language of languages' is also appropriate given that BNF is defined with a grammar, and is used to specify grammars.

Furthermore, formal grammars were originally invented for purposes of exploring natural languages.

Sorry if you didn't enjoy the article, but there's nothing wrong with it in the context of formal language theory :)

Ya, grammar only describes the syntax of a language. You still have the semantics and pragmatics to consider!

I'm not even sure if a XBNF is the best way to describe or reason about language syntax. Precedence grammars (with hacks to handle braces) are quite interesting for robust error tolerant parsing, and might more closely mirror how we internal grammars in our head.

Technically, the grammar describes the language of the language, where the "language of the language" means the formal language, the set of characters and strings that are valid (wheras the union of all of the characters allowed in every char or string that is valid in the language is the language's alphabet, not all members of the alphabet may be allowed to stand alone as a token in a given language..)
My PhD in PL tells me your right, but I'm always on the lookout for a more intuitional vs. technical definition of language, even for programming.
You could always use META II which defines the grammar and the semantics :D