| As someone with years of experience writing interpreters, this seems a rather sloppy solution to a straightforward problem. There are two canonical solutions, and he uses neither: 1. The first canonical solution, which the author even describes, is persistent environments. Unfortunately, this requires passing the current environment as part of the recursive pattern, which means heavily modifying the existing code. He doesn't do this because it would require revisiting too much. 2. The second canonical solution, which you would find in any modern compiler, is to uniquely rename all the variables. His "resolver" traversal has ample opportunity to do this, and would provide a far cleaner solution (with less space overhead than his expr/scope depth lookup table). Instead, the author creates a stack of environments and annotates variables with information akin to de Bruijn indicies for scopes. Compared to the alternatives above, this is over-engineered and inelegent, plus it complicates reasoning about scopes far beyond what's necessary. As an aside, this assertion is also completely absurd: > Shadowing is rare and often an error so initializing a shadowing variable based on the value of the shadowed one seems unilkely [sic] to be deliberate. |
Right. When I was first writing the code for this interpreter, I implemented it using persistent environments (basically the typical Scheme association list approach).
It worked, but it had some strikes against it:
1. The persistent approach is not right for global variables, which are dynamically bound. So I ended up needing two Environment classes, a Map-based one for the global scope, and then a persistent one for locals. That, of course, also requires an interface so that most code can work polymorphically with both types.
2. The Interpreter class and its visit methods are introduced several chapters before Environment. So all of those preceding visit() methods have to be redone to have the extra environment parameter passed along with passing it through some other methods.
Storing the current environment in a field helped, but the code for updating that field still looked grungy to me. With a book, every bit of extra boilerplate feels really heavy and I try to keep the code clean and simple.
3. It becomes really unclear why we want persistent local environments. Since we would have to introduce them well before the point in the book when closures can actually cause problems when not having them, it ended up feeling like the code was poorly motivated.
If I did persistent local scopes when locals are first introduced, there's no way to show them a sample program that would go back without having them — we don't have functions, function calls, or closures yet.
The current organization lets the reader go down a naive obvious-seeming path (and, better, one that reuses all the code we already need for global scopes) and then lets them viscerally experience the problem caused by thinking about blocks as a single scope. Then once they feel that pain, they get the solution.
There are some benefits to the current approach:
1. It gave me an opportunity to show an example of a semantic analysis, and adding another pass to a compiler. Those are generally useful techniques, and I think it's worth walking readers through one.
2. It lets the reader cover more ground. When we get to the second interpreter, it takes a different approach to local variables. Variables are resolved during parsing and stored directly on the interpreter's stack, and accessed by index. The block scopes are discarded and have no runtime representation.
I've tried to add other differences between the two interpreters too, just to reduce the amount of redundancy between them. For example, the Java one lexes the entire file to a list of tokens while the C one lexes on demand, driven by the parser. The Java one is an AST walker, the C one compiles to bytecode, etc.
Of course, if you are an experience language hacker, it means some stuff in the Java interpreter may seem weird because it's not the "normal" way to do things. (Though, for what it's worth, I have seen plenty of interpreters that do create hashtable-based environments for each lexical scope.)
I hope that seems reasonable. The way environments are represented in the Java interpreter was the most difficult design decision I made. I went back and forth on it a lot and I'm still not certain I made the right choice. But, ultimately, if the book is ever going to exist, I had to just pick and move forward.
> 2. The second canonical solution, which you would find in any modern compiler, is to uniquely rename all the variables. His "resolver" traversal has ample opportunity to do this, and would provide a far cleaner solution (with less space overhead than his expr/scope depth lookup table).
I considered having a problem exercise to do effectively that, but I felt like it might be reaching a little too far for a first-time language implementer.
What is absurd about it? Did I not word this well? The point I was trying to get across is that code like this is not common: I can't recall ever seeing code like this in the wild, and if I ever saw it in a code review, I would certainly tell the author to rename the local variable.Given that, it seems reasonable to me to not take the approach of deferring putting the local in scope until after its initializer has run.
Note that the above (or equivalent) code is a compile error in Java and C#. In JavaScript, using "let", it's an error. In C, it accesses the uninitialized new variable (!).