Hacker News new | ask | show | jobs
by winsbe01 5394 days ago
nice read! I wonder how much of it is still relevant today (i.e. gc speed in Java, relative superiority of its library, etc.). OCaml is one of those languages that interests me, but I don't know if I'd ever have a reason to learn it to do something specific, when the languages I've been using are already pretty flexible.

Also (unfortunately, IMHO), this talks a lot about speed, which was a big problem in the Pentium 200 days with minimal RAM, cache, etc. Nowadays, any old poorly written, garbage-collected program runs speedy as hell. Seems like lost is the golden days of optimization.

1 comments

Point 4 (algebraic data types) is really what's important; especially tagged unions. ML's type constructors map very very well to abstract syntax trees. Java has no equivalent (enums are a distant cousin).

Of course if you are a believer in higher-order abstract syntax (the idea that one should use constructs such as lambda abstractions as part of your syntax tree), Scheme is a better fit, as ML's type system doesn't allow for such wildly typed syntax trees, nor does it allow data-as-code. (I think HOAS is hogwash, but then I'm a firm believer in well-typed code and against data-as-code.)

Of course, if your AST is more a graph than a tree, you're better with a logic language such as Mercury or Prolog. I've had very good experiences writing compilers and interpreters in Mercury.

I'm curious as to what a language with a less tree-like, more graph-like AST might look. Could you provide any examples of such languages?
Many graphical languages (LabView, Simulink) are graphs. Of course you can represent them as trees if you use variable names, but that's just obscuring the fact that it's a tree.
For instance, this appears when you do common subexpression elimination -- if you refer to the same code in two or more places (e.g. call the same function, compute the same math expression), and it's known not to have (or depend on) any side effects, you can compute its value only once and refer to this value in these places. I recall it's covered even in Dragon Book, which is kind of old.
That sounds rather confused, an Abstract Syntax Tree by definition can't be a graph. What you are referring to is a some kind of an intermediate form used by a compiler, the details may very greatly, but for example directed acyclic graphs are often used for common subexpression elimination, but they are most often generated from some other intermediate form (straight-line code) and they have more in common with the code that has to be generated then with the original source code. Of course one can imagine annotating the AST and performing this optimization directly on it, but then it is not an AST anymore.
I never used "AST" term in my comment. I assumed that intermediate DAG is what the parent poster really meant.
Uh, there are many papers and examples of using HOAS in statically typed settings, such as ML, Haskell, Twelf, Coq, etc. There is still the issue of exotic terms, but there are known type system tricks for helping with that as well. And the exotic term problem would be even worse in Scheme.
OK, thanks for pointing those out to me. Google leads me here: http://adam.chlipala.net/cpdt/html/Hoas.html which discusses the typing issues.

But what of actually parsing a program in HOAS? The idea of HOAS is to translate the text "\x -> x + 5" into the expression Lam (\x -> Op("+", Var x, Const 5)). But how does one translate the string "x" into the variable name x short of data-as-code (or a similar meta-facility)?

The best Google turns up for this problem is here: http://books.google.com/books?id=OxeBw-sH4SUC&lpg=PA50&#... which seems to address the problem using a meta-facility of lambda-Prolog.

The best example I have handy is some Scala code a student I worked with wrote in Scala. I'm not sure if this is necessarily compatible with the latest version of their parser combinator library, but is hopefully suggestive:

  import scala.util.parsing.combinator.syntactical.StandardTokenParsers
  import scala.util.parsing.input._
  import scala.util.parsing.syntax._


  object HoasTest extends StandardTokenParsers {
    lexical.delimiters ++= List("(", ")", "\\", ".", ":", "=", "->", "+", "{", "}", ",", "*")
    lexical.reserved   ++= List("Bool", "Nat", "true", "false", "if", "then", "else", "succ",
                              "pred", "iszero", "let", "in", "fst", "snd")

    import lexical.NumericLit
    import lexical.Identifier

    def Term(base: Parser[Term]): Parser[Term] = positioned(
        SimpleTerm(base) ~ SimpleTerm(base) ^^ { case fun~arg => Apply(fun,arg) } // should be left-assoc
      | SimpleTerm(base) ~ ("+" ~> SimpleTerm(base)) ^^ { case a~b => Plus(a,b) } // should be left-assoc
      | SimpleTerm(base)
      | failure("illegal start of term"))

    def SimpleTerm(base: Parser[Term]): Parser[Term] = positioned(
        (("\\" ~> ident <~ ".") into (x => Parser { in =>

          def body(arg: Term): ParseResult[Term] = Term(Identifier(x) ^^^ arg | base)(in)
        
          body(new Term {}) match {
            case Success(_,rest) => Success(Lambda(x, (arg) => body(arg).get), rest)
           case f => f
          }
        }))
      | "(" ~> Term(base) <~ ")"
      | base
      | failure("illegal start of simple term"))

    def BaseTerm: Parser[Term] = positioned(
        numericLit ^^ { case n => Num(n.toInt) }
      | failure("unrecognized base term"))

    abstract class Term extends Positional

    case class Num(n: Int) extends Term

    case class Plus(a: Term, b: Term) extends Term

    case class Lambda(x: String, body: Term=>Term) extends Term {
      override def toString = "\\"+x+".("+body(new Term { override def toString = x }).toString+")"
    }

    case class Apply(fun: Term, arg: Term) extends Term

  }
How do you represent a graph in Mercury?
As a pair of predicates, one for nodes, one for edges:

:- pred asg_nodes(asg, node).

:- mode asg_nodes(in, out) is nondet.

:- mode asg_nodes(in, in) is semidet.

:- pred asg_edges(asg, node, node).

:- mode asg_edges(in, out, out) is nondet.

:- mode asg_edges(in, in, in) is semidet.

Underlying are traditional sets or what-have-you but you never have to deal with these at high levels of coding. You could then do fancy stuff like defining ancestry as:

:- pred asg_ancestor(asg, node, node).

:- mode asg_ancestor(in, out, out) is nondet.

:- mode asg_ancestor(in, in, in) is semidet.

asg_ancestor(ASG, A, A).

asg_ancestor(ASG, A, B) :- asg_edge(A, C), asg_ancestor(C, B).

Of course you'd want to wrap this stuff up in a typeclass or some such and give them useful names.

Thanks!