Hacker News new | ask | show | jobs
Ask HN: Learning Modern Compilers?
47 points by splines_tines 1106 days ago
I recall reading a comment on here at some point in the last year where someone who worked on a team that wrote compilers lamented the difficulty in hiring qualified people because the practice of compiler construction differs so wildly from what is taught in university programs or even most recently published compiler books. Apparently modern compiler construction scarcely resembles what is taught in university courses based on the Dragon book or similar, both in the higher level architecture and the lower level techniques and patterns

I know that one recent innovation is that compilers have adopted a more service-oriented architecture, kind of like the Roslyn compiler. This allows them to not only compile your code, but (for instance) inform your text editor and linter and similar tooling of syntax issues

What are other differences? Is llvm still relevant outside of academia?

Are there any books, papers, or open source projects one could study to learn how compilers are built in this day and age?

Also: does the more abstract "programming language theory" popular in the more formal functional programming world (e.g. denotational semantics, lambda calculus, Floyd-Hoare logic, type theory, etc: this sort of stuff[1]) have any relevance to compiler writers and language/language tooling developers in industry?

[1] https://steshaw.org/plt/

7 comments

A (now 6 year old) discussion that might be helpful: Anders Hejlsberg on Modern Compiler Construction: https://www.youtube.com/watch?v=wSdV1M7n4gQ

Also https://github.com/salsa-rs/salsa

That video is largely what inspired me to post this actually :)

I'll check out salsa! Thank you!

> Is llvm still relevant outside of academia?

I am surprised by the question. In my experience of the industry, LLVM has become dominant to a degree which makes some of the work kind of boring.

I expect MLIR will eventually become similarly universal. In the compiler I am currently working on, we render the AST straight into an MLIR dialect - every node becomes an op - and the rest of the frontend is implemented in terms of MLIR passes.

Generating "good" machine code from MLIR requires a huge amount of effort. With the numerous variants of instruction sets (even within a single line, e.g. x86-64), the optimisations and instruction selection is a complex task.
It did not seem especially burdensome last time I tried it. Are you familiar with the LLVM dialect of MLIR?
I started searching for the same thing a couple of years ago after I saw the same Hejlsberg interview posted by sarosh. The only recommendation I was able to get was to read the Roslyn source code and the LSP reference. I could not get any reference regarding, for example, how would one build a parser that incrementally modifies the AST as you type. And regarding PLT, I would think those subjects have to do with the semantics of the language, and are orthogonal to the technology to build the compiler. But I'm not an expert, and could easily be wrong...
Thanks, that makes sense re PLT. One thing I have heard is that PLT can be viewed as a "programming languages" approach to math foundations, or at least the foundations of computation, and largely doesn't have to do with programming language implementation, but not everyone seems to share that view. And I recently saw a comment from someone who said they worked on compilers list abstract interpretation as something for budding (industrial) compiler workers to learn[1].

I have a math background so of course I'm drawn to this stuff, but I have trouble imagining it being immediately useful for implementers. Could it be that this stuff is more useful for language design than implementation?

[1] https://news.ycombinator.com/item?id=20915485

I don't know what "abstract interpretation" is, can't help you with that. If you are interested in the real life implementation of theoretical PLT concepts, I'm not sure if this might help, but there is this page: https://www.ponylang.io/blog/2017/05/an-early-history-of-pon... where the designer of the Pony Language tells the history of how he did a Ph.D. on sound stuff and then went to design Pony. You can probably start hunting papers from this trail. Personally, I'm nostalgic of the Wirthian languages, and started developing a PL/0 compiler with an LLVM backend, before some day, if hopefully I ever develop all the needed skills, put all the Wirthian languages in a cocktail shaker, add a dash of new PL concepts, and see how that tastes.
I think llvm is not going to be irrlevent rather the amount of new languages that we see mostly depend on llvm IR. That helps them to reach multiple architectures which is especially even more important since Apple moved away from x86 to ARM.

Writing two backends for two different architectures would be a lot of work and then lots of platform specific optimizations therefore llvm is the present and seems to be the future for the foreseeable future as well.

Besides that, we or at least I studied and was implied that hand written recursive descendant parsers are inferior way of doing things. Real world usage should always start with a grammer handed over to a parser generator (bison etc) that in turn have LR(1) algorithms driven by tables.

Turns out that a of lot of widely used programming languages are moving away from LR(1) and having their parsers as hand written recursively descendent parsers.

Ruby seems to be switching away from generated LR(1) parsers, more details here [0]

EDIT: References.

[0] https://railsatscale.com//2023-06-12-rewriting-the-ruby-pars...

I'm working on a compiler for a DSL which is based on Roslyn. The most useful things for me learning wise have been:

1) Reading the source code of roslyn which can be quite readable 2) Building VSCode extensions to add diagnostics and implement code actions. You can use any open source language server as the reference

How to find jobs in compilers?
would be curious about this, too, if folks could share!