I'm not the OP, but I sympathize. The specific details covered in a "classical" compilers course are heavy weight and not super-relevant right now. These days you don't have to understand LR parsing or touch a parser-generator, you don't have to worry about register coloring... etc. Courses still use the Dragon Book which is older than I am and covers a bunch of stuff only relevant to writing compilers for C on resource-constrained systems.
Instead, I figure a course should cover basics of DSL design, types and type inference, working with ASTs, some static analysis and a few other things. That has some overlap with a traditional compilers course, but a pretty different focus.
Not really. TAPL is a very useful book, but it won't teach you how to write a compiler, unless the only part of a compiler you actually care about is the type checker. The interpreters it describes (in the chapters titled “An ML implementation of <whatever>”) are ridiculously inefficient.
A good amount of new toy-ish languages compile to another language (typically javascript), and introduce new semantics and new type rules. As parent said, small DSLs.
You don't really need more than a typechecker and ast tranformations for that.
Finally, TAPL's type checkers are pretty good. They aren't designed for efficiency, though. They're designed to closely follow the book's contents: http://www.cis.upenn.edu/~bcpierce/tapl/checkers/
I mostly agree with you; however, SSA seems like overkill up until the point where your code becomes a tangled cyclomatic mess because of the lack of it (example[1]). I'd definitely include SSA in a modern course on compilers.
Hi, this seems like a good place for my newbie compiler question.
I'm trying to write a compiler for LISP - i wrote a simple interpreter already (mostly just to learn how compilers work). Definitely did not need anything complicated for tokenizing,parsing thanks to https://github.com/lihaoyi/fastparse.
What resources are useful now for learning about implementing a type system and the optimizer? You said register coloring etc. arent important - is this because we can target LLVM?
LL parsing library, or better yet, library for PEG grammars. You know, to add another dependency to already bloated program and not to care about having O(n) parsing time/memory.
Language Implementation Patterns by Terrance Parr might be right up your alley. Implementing Programming Languages by Aarne Ranta is likewise a refreshing take on language implementation instruction. Bonus points for both books being rather concise and affordable.
I've also really enjoyed Elements of Programming Languages by Friedman & Wand (older editions cover different topics than the current 3rd edition, and it may be worth reading both the 1st and 3rd editions). Focuses primarily on interpreters, but most of the topics covered have applications for compilers as well.
Going a slightly more traditional route, I've found Andrew Appel's compiler books, Modern Compiler Implementation in ML and Compiling with Continuations to present pretty wide coverage of techniques at all phases of a compiler. The former is the one I normally recommend for people looking for their first (and hopefully only) compiler book (if they're actually looking for a compiler book rather than a DSL book).
Of course, techniques for processing the Lisp family are well-covered in Christian Quiennec's classic Lisp in Small Pieces (he's written some other books which are quite interesting for Lisp historians as well). Implementors of Lisps will also probably want to look into The Art of the Metaobject Protocol by Kiczales et al. SICP's coverage in chapters 4 and 5 is also pretty good as an introduction, even if it only barely scratches the surface.
Getting into types, you're pretty much limited to Pierce's Types and Programming Languages, but Bob Harper et al. have also written Homotopy Type Theory which is freely available on the Web. I haven't finished HoTT yet, so I can't comment on how accessible it is or how broad/deep its coverage is.
Those interested in rewriting systems should start with Term Rewriting and All That by Baader and Nipkow, which I've found to be extremely accessible and fairly comprehensive (covering abstract rewriting systems, string rewriting, graph rewriting, etc.). Your choices for books on rewriting written in English is pretty small (I've literally found only three, one of which costs $300 USD(!) -- where does Cambridge University Press get off??), but fortunately the Baader and Nipkow book is so good that you probably won't need another book and can proceed straight to the literature.
---
There are tons of books on compilation, but I honestly think they're mostly a waste of time. Each book, for the most part, only re-iterates the basics in another way, so unless you didn't quite grasp it from the first book you tried, maybe look at another one, but once you understand the material, don't waste any more time or money on compiler books. After you've grokked the basics (i.e., worked through one or two introductory books), you'll get far more value from reading books on other related concepts (e.g., type theory, term rewriting, metaobject protocols, etc.), reading research papers, journal articles, dissertations, etc., and, of course, reading code.
I say all this as someone who made the mistake of spending lots of time and money on lots of different compiler books, stupidly hoping to learn something new with each one. The silver lining is that this puts me in a pretty good position to recommend particular compiler books :)
That the Dragon Book is still considered the standard introduction is baffling, even the most recent edition is horrendously outdated and dwells far too much on parsing techniques (where, from the perspective of a compiler writer, parsing is effectively a solved problem -- not to mention that there are quite a few parsing techniques the Dragon Book doesn't cover, despite the amount of text devoted to parsing). And, like you said, the techniques it covers are really only immediately applicable to imperative procedural languages without sophisticated type systems, such as C or Pascal.
They are still teaching undergrads how to make compilers that give incomprehensible error messages, etc. Compiler technology has the possibility of making a dent in the essential difficulty of getting computers to solve problem and the compiler class is not oriented towards those needs.
There's nothing exciting about coding up a toy version of "Pascal" but it would be exciting if you could add the "unless" construction from Perl to Java in <1000 lines of code.
We are just now coming out of a dark ages in compilers that was brought on by gcc.
Before gcc you could make a living writing compilers (i.e. Turbo C, Turbo Pascal, Turbo Prolog, ...) at a moderately sized company. Today you have gcc and you have compilers from the likes of Intel and Microsoft.
LLVM breathed some life into gcc since it is now modularized so that you can do interesting things on top of gcc.
Instead, I figure a course should cover basics of DSL design, types and type inference, working with ASTs, some static analysis and a few other things. That has some overlap with a traditional compilers course, but a pretty different focus.