Hacker News new | ask | show | jobs
by rohwer 3704 days ago
Dybvig's compiler course was exemplary. Say what you may about Scheme, you learned so much more in those classes. His Scheme Programming Language book is highly recommended. Especially check out his extended examples chapter: http://www.scheme.com/tspl4/examples.html#./examples:h0
5 comments

Thanks for this! I worked on implementing a type-inference algorithm last week and I wish I had stumbled upon the chapter on unification[1] earlier.

[0] - https://github.com/prakhar1989/type-inference

[1] - http://www.scheme.com/tspl4/examples.html#./examples:h10

I thought scheme was like the beatles - people universally only had good things to say about it.
I might not be getting a reference here, but FWIW, I had a recent experience with Scheme that was interesting.

I did SICP nearly 19 years ago as a freshman, in 1997. And then a few months ago, I ported the metacircular-evaluator -- the "crown" of the course -- to femtolisp (the Lisp implementation underlying Julia).

My thoughts were:

1) It sure is awkward to represent struct fields 1 2 3 as (cdr struct), (cadr struct), (caddr), ... Yes this is nice to show that car and cdr are all you need as axiomatic primitives, but for practical purposes it's annoying. You end up with lots of little functions with long names.

2) Scheme code is very imperative! Even the metacircular evaluator uses set-cdr! and so forth. I don't like imperative code with the Lisp syntax.

3) It is awkward to represent environments with assoc lists. I feel that having a language which is really bootstrapped requires some kind of hash table/dictionary. Because you need that to implement scopes with O(1) access rather than O(n). I believe there are experimental lisps that try to fix this.

4) Macros also seem to have a needlessly different syntax than regular functions. There are Lisps with f-exprs rather than s-exprs that try to fix this: https://en.wikipedia.org/wiki/Fexpr

I was surprised by #3 and #4 -- it some sense Scheme is less "meta" and foundational than it could be. #2 is also a fundamental issue... at least if you want to call it the "foundation" of computing and build Lisp machines; I think this is evidence that this idea is a fundamentally flawed. #1 just makes it pale into languages like Python or even JavaScript.

>1) It sure is awkward to represent struct fields 1 2 3 as (cdr struct), (cadr struct), (caddr)

Every Scheme implementation I know of supports record types aka SRFI-9[0]. No one actually makes new data types from cons cells.

>2) Scheme code is very imperative!

Scheme supports many programming paradigms. Imperative programming is one. Functional programming, object-oriented programming, and relational programming are others. Not all Scheme code is imperative.

>3) It is awkward to represent environments with assoc lists.

Every Scheme implementation I know of has traditional mutable hash tables. Sometimes you want a hash table, sometimes you want an alist. It depends.

>4) Macros also seem to have a needlessly different syntax than regular functions.

I don't really understand this point. Are you talking about syntax-rules? If so, then I must disagree. syntax-rules is a very elegant language for defining hygienic macros.

SICP does not teach you everything that Scheme has to offer, and the Scheme implementations of the 1980s are a lot different than the Scheme implementations of 2016.

[0] http://srfi.schemers.org/srfi-9/srfi-9.html

OK, point taken about #1. It is valuable to have the basic axioms and then separate syntactic sugar.

But #2 and #3 are what I would call bootstrapping problems... in other words, there is a reason that C is the foundation of computing rather than Lisp. I don't think anybody really thinks otherwise anymore. But for example, set-cdr! is not in the lambda calculus, and you need it even for basic things.

Likewise, Scheme implementations have mutable hash tables, but they're written in C and not Scheme. I don't know how you even write a hash table based on cons cells rather than O(1) indexing.

Regarding #4, here is a good link. The basic idea is that macros could just be functions on lists, and then you get composition of macros like you have composition of functions. Paul Graham incorporated this into Arc.

http://matt.might.net/articles/metacircular-evaluation-and-f...

My point is you could say Scheme sits at a somewhat awkward place between "not foundational enough" (not fully bootstrapped) and "awkward in practice" (compared to say Python). Though I wouldn't go as far as to say that... Obviously it was groundbreaking work that influenced Python and R and tons of stuff we use today. It's outstanding research, but it feels like it has been almost fully incorporated into the computing culture now.

>there is a reason that C is the foundation of computing rather than Lisp. I don't think anybody really thinks otherwise anymore.

C is not the foundation of computing. Why would you say this?

>Likewise, Scheme implementations have mutable hash tables, but they're written in C and not Scheme.

A native code compiler written in Scheme would have its hash table implementation also written in Scheme.

>I don't know how you even write a hash table based on cons cells rather than O(1) indexing.

You wouldn't do that! Cons cells are not the only primitive data type! Another primitive type in Scheme is the vector, which is a mutable array.

I'm sorry, but you greatly misunderstand Lisp and how compilers work.

>C is not the foundation of computing. Why would you say this?

Because all the popular OSes, drivers, userlands, servers, GUI libraries, and compilers/languages are 99% written in C (or C++ which is close enough).

C is the foundation of computing in the sense that essentially every programming language and operating system is written in C or C++, and those are the tools that enable every other piece of software. More precisely, I would say that C is the foundation of software; it's how we stopped throwing out our programs when we changed computers. The first portable operating system kernels were written in C.

I guess I could have been more precise and said that the lambda calculus (rather than Scheme/Lisp) is not the foundation of computing. It seems like there are people who still think this; see my recent response here:

https://news.ycombinator.com/item?id=11412392

You could say Lisp and Scheme are proof of that. To actually be bootstrapped, they had to add all this other stuff like vectors and hash tables. I don't know the details of how well those are axiomatized. Paul Graham's Arc tried to a little further down, i.e. unifying functions and macros, defining numbers in terms of lists a la Peano arithmetic, etc., but I'm not sure how far that effort went.

I mentioned all my experience with Lisp... doing SICP 19 years ago, and then coming back to it. As I said, I think it's outstanding research, but if you are trying to build an entire computing universe out of it, that's folly. Good luck. It's just not powerful enough -- once you add all the stuff you actually need, you're not far from the complexity of C.

" there is a reason that C is the foundation of computing rather than Lisp"

That's a myth. Try using first-hand sources like the papers from people who designed BCPL, B, and C to understand why it's that way. Answer: terrible hardware back then. That's it. Its popularity was result of prevalence of terrible hardware and that UNIX was written in C. I broke it's history down in just a few pages with a timeline and references here:

http://pastebin.com/UAQaWuWG

Likewise, before the social effect, the OS's were coded in a number of HLL's with capabilities UNIX lacked. Languages included ALGOL, PL/0, and Pascal. Later, they were done in Modula, Oberon, Ada, Fortran (yeah lol), LISP, and so on as hardware improved beyond 1970's minicomputers. Here's some UNIX alternatives and their capabilities that developed... some of which you still don't have. :)

https://news.ycombinator.com/item?id=10957020

Far as LISP, it's been implemented in hardware multiple times. This included naive ones that worked like a simple evaluator with garbage collection built into the memory-management unit. There was also one that had four specialized units for more sophisticated execution. One Scheme was designed and mathematically verified using the DDD toolkit that was also LISP if I recall.

http://www.cs.indiana.edu/pub/techreports/TR544.pdf

Oh heck, forgot they did it with VLISP. That was a Scheme48 interpreter and PreScheme compiler rigorously verified for correctness. PreScheme was a Scheme subset for systems programming. So, the work actually took a verified Scheme then mechanically derived verified HW from that Scheme code using a LISP-based tool. I recall from other papers they got it working on a FPGA and some PAL's.

Whereas, I don't know many small teams producing verifiable C code on verifiable C processors from verifiable C tools. Nah, I don't think your favorite language is anywhere near where you think it is. I don't even find LISP ideal here by far. It just did more in functionality & bare metal. Also, first LISP machine was started when UNIX was released interesting enough.

> there is a reason that C is the foundation of computing rather than Lisp. I don't think anybody really thinks otherwise anymore

I must not be anybody, then, because I think that Lisp provides a wonderful notation for thinking about symbolic computation which will last for millennia while C is … a successful programming language of the late 20th century.

> I don't know how you even write a hash table based on cons cells rather than O(1) indexing.

A cons cell is just a double-pointer. You can write a hash table using conses just as easily as you would using pointers (note, I'm not saying that'd be efficient, which is why Lisp offers arrays as well as conses).

C isn't the foundation of computing. It's a popular systems language largely because of, first, it's connection with UNIX, and, second, network effects of its early popularity.
C is far too high level to be a "foundation". NAND gate is a foundation.
> 1) It sure is awkward to represent struct fields 1 2 3 as (cdr struct), (cadr struct), (caddr), ...

The Right Way to do this is with a D-List, or Detached List, analogous to an A-List or a P-List. An A-List, if you recall, is a list of key-value conses. A P-List is a list of alternating key-value pairs. A D-list is a cons of a list of keys and a list of values, i.e.:

((key1 key2 ...) val1 val2 ...)

D-lists are superior to A-Lists and P-Lists because:

1. The key list structure can be re-used

2. D-ASSOC only requires one traversal down the key list, after which the index of the key can be cached. This is usually the first step in writing a "fast" interpreter, but if you use A-Lists or P-Lists then you have to change data structures. If you use a D-List you already have the optimized structure in the CDR of the D-List pair.

3. Going from optimized interpreter to full compiler is a simple matter of replacing the linked list of values with a vector of values.

It's a shame that D-Lists are very rarely taught.

Interesting. I've never heard of D-list nomenclature before.

With the A-list method you typically need to write a "zipper" function, that recursively conses the heads of two lists to generate the A-list. Makes "apply" an expensive operation with needless allocations.

What's great about D-lists is that the "make-env" method is just a single cons operation. Clearly superior. I'm surprised its not more well known.

> Clearly superior.

Thanks.

> I'm surprised its not more well known.

Yeah, me too :-(

Only global/dynamic environments should be hashed. If you're optimizing lexical environments from assoc to hashing, you're optimizing interpreted semantics instead of writing a compiler. The location of a lexical variable isn't a moving target; it sits at some statically fixed offset in some environment frame, and access to it is reduced to an indexing operation in the compiled code.

Hashing lexicals will not necessarily speed up an interpreter. It depends on what kind of code and how you do it. A lot of code has only a few lexicals at any binding level. If you construct a new hash table on each entry into a binding construct which has only a handful of variables, that could end up performing worse than the original assoc lists. You still have to cascade through multiple hash tables under that approach to resolve nesting. One hash table for an entire lexical scope leaves you with problems like how to resolve shadowing, and how to capture closures at different sub-nestings of that scope that have different lifetimes from containing scopes.

#1 - srfi 9 solves this mostly, though I agree about the long names. If your implementation doesn't bundle that for you, you should probably use a different implementation.

#2 - preach it.

#3 - The "big" schemes provide hash tables for you.

#4 - Well... fexprs used to be standard in lisp, but implementers didn't enjoy figuring out what a symbol meant at compile time.

I dunno, I really enjoy coding in scheme. Especially syntax-rules. I know syntax-case is the bees knees, but syntax-rules is pretty easy for me to model quickly.

Regarding #2: I have a complete version of McCarthy's original evaluator at https://programmingpraxis.com/2011/11/01/rip-john-mccarthy/, and it doesn't use a single mutation.
Schemes minimalism and elegance makes it a great first language (in my opinion). Plus you can explore multiple programming paradigms - functional, imperative, object oriented. There are legitimate reasons why scheme isnt a very practical language, but its a good first language imo.
>There are legitimate reasons why scheme isnt a very practical language

The operating system I currently run uses a Scheme program as its init system and a Scheme program as its package manager. Scheme is a practical language.

What operating system is it? Is the Scheme written in C or assembly language?
See GuixSD: https://www.gnu.org/software/guix/

Edit: remove useless commentary

To be fair to perception, that Scheme (Guile, I'd imagine) isn't exactly the subset of MIT-Scheme used in SICP.
SICP 2e sticks to IEEE Scheme.
Is the fundamental issue in #2 that you don't like the syntax? or that Scheme has mutation?

It's interesting looking back on the history of Scheme. Probably because of the order of presentation of SICP, along with hearsay, people seem to get an impression of Scheme being all about functional programming (and if it's because functions are values you can pass around... well even Algol and Pascal could do that). It's true the original paper was called "Scheme: an interpreter for the extended lambda calculus" [1], but the big idea was that if variables were bound lexically, and if the environment structures used to close the variables were mutable, then you would have something like the actor model -- functions could have state and respond to messages. As they admit in the abstract, the purpose was to demonstrate the core interpreter for implementations of contemporary AI systems. The chapter on register machines in SICP is just an elaboration of their methods in this paper.

[1] http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-...

Something about porting a metacircular evaluator seems odd to me. Not that it's a bad exercise -- it's a good one. But rather, that it's no longer a _metacircular_ evaluator. The progression of the book is 1) abstracting processes and names with procedures 2) abstracting data representation 3) abstracting interface as a module, along with the theory and practice of mutation 4) abstracting semantics with interpreters (programs which take programs as data). The proof that 4 is a thing is by implementing an interpreter for the language in the language and then extending the interpreter in various ways. Even procedure calls are implemented by calling a procedure in the host language. (All I'm saying is that you exercised the metalinguistic abstraction by writing a scheme interpreter in femtolisp. Metacircularity in itself is not much more than an interesting phenomenon to demonstrate.)

(P.S. The language in SICP is not the Scheme which was later standardized, say R5RS + SRFIs. Mutation of lexical closures didn't change. You can't really (cleanly) get away from that until you invent something like a monad to model state.)

It's both... I feel like Scheme is oversold as axiomatic mathematics. You have to add all this other stuff to it to make anything useful, in particular mutation and mutable data structures. Even toy programs like the metacircular evaluator SICP need this!

And yes, I've been programming a lot in the intervening 19 years, and doing imperative programming with ((Lisp)) syntax is hugely annoying. OCaml actually annoyed me in this regard too. Maybe I will like Haskell, since it seems principled about mutation.

This is a real misunderstanding, see my response to this comment here: https://news.ycombinator.com/item?id=11412392

People think that there could have been some "Church basis" for computing. In other words, the whole Lisp machines thing was folly. It's rightly in the dustbin of computing history.

FWIW I did many experiments in bootstrapping languages, with Python/Lua, OCaml, femtolisp, C, ... I eventually ended up with (a tasteful subset) of C++, which somewhat amazed me, since I've never been one to like C++. This is a whole other story, but it had to do with the fact that OCaml "needs" code generation with ocamllex and ocamlyacc/menhir, and I was looking at how Julia is bootstrapped Lisp (impressive, but not what I want), etc.

In pursuit of a mathematically derived Scheme, you might be interested in John Shutt's Kernel [0], based on his formal theory of F-exprs called the Vau Calculus [1].

[0] http://axisofeval.blogspot.com/2011/09/kernel-underground.ht...

[1] http://lambda-the-ultimate.org/node/4093

I agree with you on this. I used lisp/scheme a fair bit in 2004-2008 and over time have gravitated to C++. Not being able to treat memory as a first class primitive ends up being restrictive eventually.
I don't really understand our what your issue with ocamllex/ocamlyacc is. Clearly you don't need them if you are willing to write your lexer/parser yourself. Also the same tools exist (and originate) in Unix/C land with (f)lex and yacc (a.k.a bison)
WRT 3, assoc lists are just the simplest thing to store things like environments, they're traditional for "writing a Lisp in one week". If you modulerize your code, you can trivially use hash tables or whatever later. Ditto using lists for structures.

4: As I understand it, there are many macro systems available for Scheme. It's one of the languages in which "hygienic" macros has been explored, etc. So if you don't like the femtolisp bundled version, there might be another out there more to your taste.

If it's being used as a teaching language, it depends on the students. Some students will rail against the idea of learning a tool, any tool, that they don't think they would use professionally, even if the purpose of using the tool is to give them a different perspective on programming.

Also, some people just really hate the parentheses (not me).

Also, some people really hate the Beatles (not me). More so in the '60s as contemporary artists, less so now when they're more likely to be referred to in a historical context. But still. The White Album had no shortage of devastating reviews when it came out.

I'd imagine Scheme as a teaching language would be less controversial in schools where the CS students are more likely to already know some programming when they start, and/or don't have as much of a trade school mentality.

I newly joined this company, and they have 20 years old codebase which is mix of C and Scheme code. The only way they debug the massive Scheme part is using print statements. And I only have bad things to say about that. :( If there's a better way all of them have been missing, I'd love to hear that. I've learned that they have adapted the MIT Scheme implementation to add Object Oriented features, and it "kind of" works like an object oriented language, except when it doesn't - which happens a lot. It's a mess.
Pretty much every 20 year old codebase is a mix of terrible stuff. The exceptions are rare and usually involve a strong handed dictator who is willing to make cleanups from time to time.

You're lucky it isn't Fortran and a homegrown (crappy) macro language.... You can't fairly judge Scheme or C from a legacy codebase unless you judge every other language that way too.

I'm judging only from its debugging capabilities - and it looks like even now there's no (open to all) way to do that. Compared to that, C even if that's 20 years old code too, has some great debugging tools for it.
> The only way they debug the massive Scheme part is using print statements.

Why would you need anything else for debugging?!?

See I'm so naive I don't even know if this is sarcasm. :/
I'm 100% serious. Interactive debugging is hugely overrated. I am not aware of a debugging technique better than logging + contracts + asserts.
For one, that forces you to change the code and recompile all the time. And some projects have huge recompile times.
I was fortunate enough to take both his compiler course as well as a follow-up course involving optimization and hygienic macros. Brilliant man, fantastic teacher, and of course, he wrote a great compiler :)
Thanks for referencing Dybvig's compiler course, can you point me to the course materials online, or share them with us here, if it's not an issue of copyrights of course?

I tried looking the course materials online but the Indiana University website gave me a 404 error page when I tried to access the course from Dybvig's website.

Thanks.

Paul is that you?