Hacker News new | ask | show | jobs
by vertexmachina 1464 days ago
I'm in the middle of this book at the moment and I have mixed feelings on it.

It's definitely well-written and you can feel the love and care that went into producing it. But I think it would have been stronger had Nystrom skipped the Java version and spent those pages on theory instead before jumping into the C implementation. While going through the Java stuff (implementing in C# instead because I have an emetic reaction to Java) I found myself just wanting to hurry up and get to the good stuff in C. And I found the visitor pattern and code generation stuff to be a distraction.

The code can also be a bit hard to follow at times because he jumps around from file to file and function to function. He'll present a function that calls a function that doesn't exist yet, and then define the new function, etc. He does it that way because he wanted to ensure that every single line of code was present in the book (he wrote a tool to ensure that was the case), and it certainly helps if you're trying to copy his code verbatim, but not so much for high-level understanding. Maybe that's a failing on my part.

Finally I wish he had gone into more detail on how one might implement a register-based VM because they are faster and more interesting (to me) than a stack-based one.

3 comments

The Java part is probably very lucky to have if he ever writes a second edition though. The reason is, for many dynamic languages the best way to make an interpreter fast is now to use the Truffle framework, not write a bytecode interpreter in C. Truffle changes the whole space around language interpreters so radically that it feels like it should definitely be worth a mention in any future take on the topic.

With Truffle you start with a Java based tree walking interpreter (could use Kotlin too for less boilerplate), so the very easiest stuff. Then you annotate it in some places, add a dependency on Truffle and ... that's it. Now you have a JIT compiler for your interpreted language. The next step is to start refining the use of the annotations and to write specializations in order to accelerate your JIT by incorporating standard techniques like polymorphic inline caches.

Finally you can compile your new interpreter+JIT engine down to a Graal native image (single standalone native binary), thus ensuring it can start as fast as an interpreter written in C and can warm up as fast as V8. The binary can also be used as a shared library.

Given that this tech exists now, people who choose to write their interpreter in Java will have a massive edge in performance over people walking the traditional path. It's still relatively new technology but it's hard to see how it doesn't change interpreter design forever.

I sympathize with your criticisms about java because the language is... not my favorite. It would be helpful here to look at its choice as a result of Nystrom solving the intersection of multiple optimization problems:

- Manual Memory Management Is Hard. Interpreters are complex pieces of software, you don't need another rabbit hole to dive into while you're learning your first parser. You don't need to agonize over where to put the contents of the file buffer you're parsing before writing your first lexing switch. People spend years with C and C++ and still get MMM wrong. The book is supposed to be fun.

- Data Structures Are Hard. This doesn't apply to C++ or really any modern language, but since you wanted it done in C the first time, that would entail the obligatory "Implement your own universe from scratch" exercise C is infamous for. I don't mind, I always like implementing my own universes (although I despise C even more than Java, it can't be over-emphasized how badly engineered that language is). But again, Pedagogy says that you should introduce the minimum possible surface area while approaching a new topic, ideally a single topic at a time.

- Interpreters Should Be Fast, so overly dynamic languages like python and javascript are out.

- A teaching language should be popular and familiar. The obvious benefit is accessibility to as many learners as possible, but a less obvious one is the availability of tools and cross platform support.

Out of the vast array of available programming languages and their toolchains, the combination of GC, powerful standard library and reasonable performance excludes a whole lot. The popularity requirement basically only leaves Java, C# and Golang standing.

That's a really good summary. I didn't pick C# in part because it feels more tied to Microsoft and Windows than I wanted the book's language to be. Java (to me at least) feels fairly platform and corporation independent.

If you would have prefered I pick Go, you'll definitely like Thorsten Ball's two books.

Choosing a language for books is really hard these days. There are so many choose from and most are quite large and complex, so it's hard to find a single language that is familiar to a large enough segment of the audience.

Eh, you don't have to use Java for the first part. I didn't, and I've seen many other people in discussions of the book say they didn't either. It's readable enough, and the explanations are clear enough, that you can follow along in any other memory-managed language that you're comfortable with.