Hacker News new | ask | show | jobs
by munificent 53 days ago
> I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features.

Performance.

You definitely can write a compiler in a high-level language and given the choice I certainly prefer to on my hobby projects. Having a garbage collector makes so many compiler algorithms and data structures easier.

But I also accept that that choice means there's an upper limit to how fast my compiler will. If you're writing a compiler that will be used to (at least aspirationally) compile huge programs, then performance really matters. Users hate waiting on the compiler.

When you want to squeeze every ounce of speed you can get out of the hardware, a low-level language that gives you explicit control over things like memory layout matters a lot.

6 comments

> But I also accept that that choice means there's an upper limit to how fast my compiler will.

Don't buy it.

A decent OCaml version of a C or Zig compiler would almost certainly not be 10x slower. And it would be significantly easier to parallelize without introducing bugs so it might even be quite a bit faster on big codebases.

Actually designing your programming language to be processed quickly (can definitively figure things out with local parsing, minimizing the number of files that need to be touched, etc.) is WAY more important than the low-level implementation for overall compilation speed.

And I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues and debugging.

I like Zig, and I use it a lot. But it is NOT my general purpose language. I'm definitely going to reach for Python first unless I absolutely know that I'm going to be doing systems programming. Python (or anything garbage collected with solid libraries) simply is way more productive on short time scales for small codebases.

> I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues

Agree, that many people are using languages out of context to what they are actually trying to do. In many cases, using a GC language would be far more productive for them. Though I do think we should distinguish between compiled and interpreted GC languages, as often there is a significant gap in performance, that can be wanted or appreciated.

> Though I do think we should distinguish between compiled and interpreted GC languages, as often there is a significant gap in performance, that can be wanted or appreciated.

Sure, that is tautologically true.

However, I maintain that the original author would have gotten much further even with a pathologically slow Python implementation. In particular, munging all the low-level stuff like linking is going to have full-on libraries that you could pass off the task to. You can then come back and do it yourself later.

For me, reaching a point that helps reinforce my motivation is BY FAR the most relevant consideration for projects. Given the original article, it seems like I'm not alone.

For context, this is the author of https://craftinginterpreters.com/ , and https://gameprogrammingpatterns.com/ , and https://journal.stuffwithstuff.com/2015/09/08/the-hardest-pr... .

I promise that he knows a thing or two about compilers and performance!

For what it's worth, I agree with him. A recent example is the porting of the TypeScript compiler to Go: it hasn't been fully released yet, but people are already going wild for its performance improvement over the original in-TS compiler.

Of course, it took them over a decade to reach the point where a port was necessary - so it's up to you to decide when that decision makes sense for your language.

I think once you get the design of the IR right and implement it relatively efficiently, an optimizing compiler is going to be complicated enough that tweaking the heck out of low-level data structures won't help much. (For a baseline compiler, maybe...but).

E.g. when I ported C1 from C++ to Java for Maxine, straightforward choices of modeling the IR the same and basic optimizations allowed me to make it even faster than C1. C1X was a basic SSA+CFG design with a linear scan allocator. Nothing fancy.

The Virgil compiler is written in Virgil. It's a very similar SSA+CFG design. It compiles plenty fast without a lot of low-level tricks. Though, truth be told I went overboard optimizing[1] the x86 backend and it's significantly faster (maybe 2x) than the nicer, more pretty x86-64 backend. I introduced a bunch of fancy representation optimizations for Virgil since then, but they don't really close the gap.

[1] It's sad that even in the 2020s the best way to make something fast is to give up on abstractions and use integers and custom encodings into integers for everything. Trying to fix that though!

Does “low level” translate to performance? Is Rust a “low level” language?

Take C#. You can write a compiler in it that is very fast. It gives you explicit control over memory layout of data structures and of course total control over what you wrote to disk. It is certainly not “low level”.

> It gives you explicit control over memory layout of data structures

Some with structs, yes. But overall it doesn't give you much control over where things up in memory once references get involved compared to C, C++, or Rust.

>Having a garbage collector makes so many compiler algorithms and data structures easier.

Does it really? Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down. So you can just make append-only data structures and not care too much about freeing stuff.

I never worry about memory management when I write compilers in C.

> Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down.

This is true if you're writing a batch mode compiler. But if you're writing a compiler that is integrated into an IDE where it is continuously updating the semantic state based on user edits, there is a whole lot more mutability going on and no clear time when you can free everything.

Any AOT compiled language offers enough performance for writing full compiler toolchain.