| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by moron4hire 1479 days ago

I'd say there's probably about 1/2 to 2/3rds overlap between the two. You'd still need to do all the same lexing and parsing. You still need to create a model of execution for your language.

But the primitive substrate of the machine (be it real or virtual) turns implementing that execution model into a puzzle all its own. You might implement an array in your interpreted language as just an array in your host language. You might implement an object as just a HashMap between field names and values in your host language. Interpreters don't have to build their own interfaces with the operating system, they reuse those interfaces from the host language in which they are developed, but a compiled language will need some way of its own to execute system calls.

Like, printing some text to a terminal. Say we were implementing an interpreted language in C#. We get to the point of writing our own "print" function for our own language. We'll probably end up creating some kind of translation from our language's print format to some kind of call to System.Console.WriteLine in C#. You might even like the format of Console.WriteLine, so you might even make it a straight, one-to-one translation and call it a day.

But if you're writing a compiled language, you'll need to know how the operating system you're running on expects to receive and execute commands. That's a whole other thing. To grossly oversimplify, it largely means getting a bunch of bytes into the right format and order into memory and then executing a specific CPU instruction. Want to allocate memory? There's another blob-of-memory-plus-execute-an-instruction interface you'll need to adhere to. Want to open a network socket? Same. And modern operating systems provide a lot of functionality.

But also, a lot of that work is kind of grunt work. There are certainly ways you can design a language that make it more difficult to implement as a compiled language than an interpreted one (dynamic typing, for example). Let's gloss over that issue. All else being equal, the language portion of the work being done in interpreters versus compilers is largely the same.

3 comments

thefaux 1479 days ago

This way of describing compilers seems to imply that a compiler must emit machine specific assembly code, which seems overly narrow. A different way to think about compilers vs interpreters is that compilers are programs that read source code as input and generate an executable artifact as output while interpreters are programs that read source code as input and then, as a side effect, perform the instructions within the interpreter process.

Note that taking this broader definition of compilers, it is not necessary for a compiler writer to target the host architecture or learn about the sys calls. Many languages have a non-native host target, e.g. Typescript (javascript), Scala (jvm) and F# (.Net), but we still call the programs that translate source written in these languages to the target executable format compilers.

Going from an interpreter to a transpiler, which I personally consider a compiler, can be an almost trivial step. Let's assume that there is already an interpreter for the language and that it is implemented as a giant switch statement based on the op code of each instruction. Given an arbitrary target language in which all of the required instructions of the interpreter have a concrete representation, one could write a transpiler to this target language by replacing the right hand side of each statement in the switch with code that appends to a source file in the target language (there'd also generally need to be some surrounding boiler plate to do things like import required headers).

In practice, these days it is quite common for languages to transpile to C, LLVM IR, the JVM or Javascript. Even if one does want to emit their own machine code, it would still probably make sense to first target something simpler and not waste time in the low level details of language features that may or may not even prove useful (or the language itself may not prove useful). Again, going from interpreter -> transpiler can be a simple step. It is not unrealistic to write a useful transpiler in a day, particularly if you make the language syntax very simple and/or use a parser generator.

link

moron4hire 1479 days ago

> A different way to think about compilers vs interpreters is that compilers are programs that read source code as input and generate an executable artifact as output while interpreters are programs that read source code as input and then, as a side effect, perform the instructions within the interpreter process.

I think that's a very fair definition, and one I agree with completely. But I also admitted that I was grossly oversimplifying, which I thought was necessary given the stated background of the person I was responding to. As you pointed out, the step from interpreter to transpiler is almost trivial. My goal was to attempt to describe the much less trivial portions of the work without getting too bogged down in details.

But you make good points about transpilers that I probably should have mentioned. Lots of very good, very valuable work has been done with languages that have not gone all the way to emitting CPU-specific op codes.

link

13of40 1479 days ago

> ...Want to open a network socket? Same...

From your compiler's perspective you shouldn't be messing around with all that, you should have an abstraction that lets you say "Pass a by value, b by reference, and c as an out parameter using calling convention X". As long as malloc and opensocket or whatever use the same calling convention, all of the actual byte layout is a one time effort.

link

moron4hire 1479 days ago

exercise for the reader

link

ammanley 1479 days ago

Really appreciate the through review, this sheds a lot of light on things for me. Thanks a ton.

link