Hacker News new | ask | show | jobs
by mahmud 6111 days ago
[Summary: read this paper instead http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf]

The article is both acceptable and appreciated, but not good.

There are far better, not to mention easier ways to start hacking a compiler quickly than doing it with Flex/Bison/LLVM and in C++. Look at this over engineering:

http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/4/

A compiler should be written as a fluid, jelly-like organism; you will be changing it so much and so often, it's a waste of time to introduce any structure like that to it so early. The only place where you need a heavy design is the intermediate representation; and to this extent, you want the most flexible "design", if you can get away with Lisp-like S-expressions, by all means do it.

You will be annotating the intermediate representation in multiple phases, so don't hesitate to copy deeply instead of mutating it with surgery. Don't bother with an elaborate symbol table design, just use the cheapest/easiest hash-table you can find. Keep your IR human readable or you will be forced to write binary analysis tools before you even settle on an IR format (horrible chicken and egg problem; and that's what you get when you model your IR with a giant C union .. you know, that trick, don't do it!)

For the last 20+ years, Schemers have been losing their voices preaching the trivialization of compiler hacking. Listen to them; Schemers live in a parallel universe to the mainstream compiler community, which still, even if they don't know it, are hard at work improving the first Fortran compiler.

Have fun!

2 comments

I also like Kragen's Ur-Scheme as a concrete readably-small example of a self-hosted compiler to x86, inspired by the Ghuloum paper you reference. [At http://www.canonical.org/~kragen/sw/urscheme/ ]
Sadly we don't all have these options. The project I'm working on right now, a prototype DSL for writing counters to check data, can't be written in a fancy language like ML, Haskell or hell even Python. They don't want any "weird languages" that someone else will have to maintain once I leave. So C/Lex/Yacc it is.
Prototype it in the language you know best, implement it in the language you're "required" to.

A typical compiler project is at least 6 months away from start to finish. Deliver something in Python in 2 weeks and see if they can resist it; you still have 5.5 months to flesh it out in C if you still have to. I don't for once buy that a software shop will refuse having a working demo immediately, even if it's in APL.

What about Clojure? It's a lisp and can be used the Scheme way. At the same time, it runs in Java virtual machine and therefore can be controlled directly from Java. That is, your legacy code can be written so that it can be maintained by Java programmers (this is to sell it to "them".)
That might work if their concern is deployment of his code. But if their concern is actually maintenance (you know, patches, updates, bug fixes), than I don't see how Clojure, Scala, JPython, etc. would be any more acceptable. The concern is probably having legacy code in a language that nobody else on staff knows how to program in.
Correct: they will not be able to program in clojure. But they should be able to interop java with the classes created in clojure. No REPL environment and all code compiled is a requirement for this kind of legacy work, but it can be done easily. (Or "should" be done easily.)