Hacker News new | ask | show | jobs
by teo_zero 1045 days ago
Am I the only one lost at the very first paragraph? It shows something that looks like a grammar, but I can't understand what "Num", "Int", "If" are... Are they literals? terms? types? And what does "data" mean?

So it's not a grammar. "data Exp" might describe each internal node of the AST, where the first word is the node's variety and the following words describe the children nodes. But what does "data Type" mean, then?

6 comments

No pun intended: this is called datatype, algebraic data types, ADT, etc. They are popular in many functional languages.

You can think of them as enumerators that can possibly hold extra data.

The first definition (data Exp) is indeed a grammar. Thus "Num", "Int", "If" are just the names we give to different cases we encounter in the grammar. As for what they are... you can think of them as functions that wrap some data inside a struct of the same name (function names and type names don't collide). As for the name, they are called data constructors.

As for what "data" means, we are defining a new type which can hold some data (again). You can think of it as a synonym for "enum". The reason word "data" used here is because there's something called type synonyms. For example one can write:

type Email = String

type Password = String

Here the underlying representation for both email and password is string, but compiler treats Email and Password as another name for string. But if you were to use data constructors:

data Email = Email String

data Password = Password String

We are creating entirely new types. Even though the internal representation is still string, if we were to supply password when an email is expected, we get a compile error.

Essentially a wrapper class can do the same. But using data constructors we get the same benefit + something called pattern matching + exhaustive error checks + cleaner syntax + additional low-level optimizations by compiler.

> The first definition (data Exp) is indeed a grammar

No, it isn't. It's an AST datatype. That definition does not at all cover the relationship between text and structure, as a grammar does.

  data Exp = Num Int
           | Bool Bool
           | Var Var
           | If Exp Exp Exp
           | Lambda Var Exp
           | App Exp Exp
An Expression is either:

- a Number literal, with an Integer value

- a Boolean literal, with a Boolean value

- a Variable declaration, with some identifier

- an If expression (probably), with a condition, taken, and not-taken Expressions

- a Lambda, which substitutes a Variable into an Expression

- an Apply, which is how we chain Expressions and apply the results of one expression onto the next

  data Type = TyInt | TyBool | TyArrow Type Type
Our Types are TypeInt, TypeBool, or a Function (TyArrow) which goes from an input Type to an output Type
See https://wiki.haskell.org/Type for a brief description and the following for something similar to the article: https://wiki.haskell.org/Parsing_expressions_and_statements
That looks like Haskell source code. Well, maybe some similar language.

If that's indeed Haskell, then those words that start with a capital letter are "data constructors". Well, think a decent language will usually have one or two data constructors: cons for lists and something to make arrays. But, in Haskell, for some reason... they allow you to define your own.

Functionally, the thing is a preparation for the follow-up code that matches some data bits to one of these "classes" and executes some bits of code if the "class" matched.

I think the post assumes some ML background (here OCaml). it is defining a type for the AST in a compiler.
I guess the author wrote the article with an audience that understands ML linage of programming languages.

I had no issue understanding it is an AST for lambda calculus, as it is clearly mentioned, however I know these kind of languages since Caml Light.