| To provide more precedents and a little history: The first C "interpreters" I know of were for Lisp machines: Symbolics' C compiler (http://www.bitsavers.org/pdf/symbolics/software/genera_8/Use...) and Scott Burson's (hn user ScottBurson) ZetaC for TI Explorers/LMIs and Symbolics 3600s (now available under the public domain: http://www.bitsavers.org/bits/TI/Explorer/zeta-c/). Neither of them are interpreters, just "interactive" compilers like Lisp ones are. I am writing a C to Common Lisp translator right now (https://github.com/vsedach/Vacietis). This is surprisingly easy because C is largely a small subset of Common Lisp. Pointers are trivial to implement with closures (Oleg explains how: http://okmij.org/ftp/Scheme/pointer-as-closure.txt but I discovered the technique independently around 2004). The only problem is how to deal with casting arrays of integers (or whatever) to arrays of bytes. But that's a problem for portable C software anyway. I think I'll also need a little source fudging magic for setjmp/longjmp. Otherwise the project is now where you can compile-file/load a C file just like you do a Lisp file by setting the readtable. There's a few things I need to finish with #includes, enums, stdlib and the variable-length struct hack, but that should be done in the next few weeks. This should also extend to "compiling" C to other languages like JavaScript, without having to go through the whole "emulate LLVM or MIPS" garbage that other projects like that do. I think I figured out how to do gotos in JavaScript by using a trampoline with local CPS-rewriting, which is IMO the largest challenge for an interoperable C->JS translator. As to how to do this for C++, don't ask me. According to the CERN people, CINT has "slightly less than 400,000 lines of code." (http://root.cern.ch/drupal/content/cint). What a joke. |
What I can't wrap my head around is how one would implement pointer-arithmetic with these closures? C pointers are not just references to cells, but those cells are guaranteed to be contiguous (up to a certain limit, be it an VM allocation unit, say "page", or all available system unit in VM-less systems)
That is to say, C pointers are not like ML references. Along with SET and REF they also allow addition, subtraction, scaling, etc.
For closures to model C pointers, wouldn't they need to order the allocation of cells in some manner? say, big array? And if so, this could get expensive very quickly (worst-case being "modeling" of entire memory, i.e. emulation) without certifying compiler or at least some exhaustive pointer analysis.
Hope I'm wrong on this.