Hacker News new | ask | show | jobs
by dfox 3478 days ago
I see few flaws in their approach:

The stack based virtual machine represents instructions as two field structs that are essentially predecoded (and larger in memory) while the register VM uses what boils down to array of words which are then presumably decoded at run time (the paper even contains some kind of argument why the code is represented as array of single-field structs, but completely neglects to mention what would happen if the structs contained some kind of pre-decoded instructions).

The function call mechanism of their stack based VM seems to be essentially modeled after how forth works, which is not how typical stack based VM used in implementation of languages that do not directly expose the VM semantics works. Typical stack based VM (eg. Smalltalk, JVM, CPython...) allocates new argument stack for each function invocation (often using alloca() or such, ie. on native control stack) and does not directly use the VM argument stack to pass parameters and results between functions.

2 comments

How do most stack based VM represent instructions? When I went through a phase of writing VMs, the approach I used was to represent everything with a single byte. if a "push" byte was encountered, it would scan ahead 8 more instructions and construct an int. So it had the overhead of having to decode and encode actual data, but the instruction stream was very dense.

I imagine in the real world, they use something like

    union instruction_packet {
        int64 value;
        int8[8] instructions;
    }
Which would be less space efficient (when you hit a push instruction you'd need to move to the next packet, even if it was the first element of the array) but gets rid of the decoding/encoding problem.
The JVM uses an 8-bit opcode, of which 202 are defined. Some opcodes have implicit operands (e.g., iload_1, load the first local variable as integer on the stack or iconst_m1, push -1 on the stack), while others take usually an index that's usually 1 byte (or 2 byte for jump offsets)--although there's a wide instruction that makes the next instruction use 2-byte operands instead.

Instruction decoding for the entire JVM instruction set save tableswitch and lookupswitch amounted to about 50 lines of code, not counting the enum listing every opcode.

I more meant - what shape is in the instruction stream? Is it literally just a stream of individual bytes, that have to be decoded when pushing constants etc?
The Code attribute in the JVM spec consists of the number of local variable slots and the maximum stack depth, a byte array, and a collection of attributes to represent things like the stack map or exception handling tables.

Most of the opcodes have zero or one operands, usually a single byte (or two bytes with a prefix) that's an index into the local variable array or the constant pool. There's several ways to load constants: the integer constants -1 through 5, long constants 0 and 1, float and double 0, 1, and 2 all have 0-operand opcodes (e.g., iconst_0); there's an opcode to load a 1-byte immediate, another to load a two-byte immediate, and an opcode to load a constant in the constant pool (which can be a string, a Class<?> reference, an integer, long, double, or float).

It should be noted that the JVM specifies a fixed big-endian format in its bytecode, and as a result, even the two-byte immediates are specified in the manual as two operands of a single byte each.

That's one possible design, but it's common to use a constant pool. Placing constants in the instruction stream is possible but awkward, since there is no guarantee that they are either aligned or of the right endianness (assuming you are trying to write a more or less portable system).

Constants that fit in a byte don't have this problem and can be easily placed in the instruction stream using dedicated byte constant instructions, sometimes including dedicated single-byte instructions for common constants like zero, one, etc.

Forth a register-based VM right?

Does it the VM stack as well instead of the native stack?

Some Forths compile to native machine instructions and no VM is involved. Sometimes they compile a list of threaded jump addresses with an "interpreter" that's basically just two instructions: JSR [nextaddress++]; LOOP. Sometimes they just compile literal sequences of JSR instructions and there's no interpreter at all.
Wow this is really interesting. I didn't realize there were different implementations.

Do you have a resources regarding these? I would love to learn more about this.

The wikipedia entry [0] contains some refs. The out-of-print book "Threaded Interpretive Languages" [1] was the definitive treatment of the topic, and what I mostly used back when I was writing Forth compilers.

[0] https://en.m.wikipedia.org/wiki/Threaded_code

[1] https://www.amazon.com/Threaded-Interpretive-Languages-Desig...

Moving Forth is pretty good: http://www.bradrodriguez.com/papers/moving1.htm

There are 8 parts to the series, you can look at all of them and other Forth writings by the author on his website: http://www.bradrodriguez.com/papers/

Thanks for the links!

For anyone else interested I found this book which is about hardware-based stack machine sand Forth.

To quote the book:

"While some of the material is obviously dated and new work has been done in this area, I believe that the book remains the principal reference work on Forth-style stack computers. "

https://users.ece.cmu.edu/~koopman/stack_computers/index.htm...

It is also available on Amazon for a dollar.

The author's page:

https://users.ece.cmu.edu/~koopman/stack.html

Seeing as you know something about Forth. Is it used somewhere in production outside of purely academic setting?
FreeBSD bootloader uses Forth:

https://www.freebsd.org/cgi/man.cgi?query=loader&sektion=8

Also, while it's not Forth, it could be argued that the RPL language as used on HP scientific calculators, is a kind of higher-level Forth produced by cross-breeding with Lisp. Indeed, its name, while officially not an acronym, originated as "Reverse Polish Lisp". Its implementation is also kinda awesome, because it has two layers - User RPL, which is more high-level and completely memory-safe, with error checking everywhere, and is exposed directly to calculator users; and System RPL, which basically just compiles the source directly to a bunch of call opcodes (so if you mismatch the stack at any point, it just crashes), and which is used to implement the calculator OS and stock apps, as well as User RPL.

http://www.hpmuseum.org/rpl.htm

Yes, but its heyday was the 70s through mid-80s. The most recent use I'm familiar with was the OpenBoot firmware in OLPC laptops, though I've been out of it so long I wouldn't know what's current.
Anyone that worked with Sun Sparc hardware probably remembers the PROM monitor(SUN's BIOS basically) which was written in Forth and had a Forth interpreter. There's a couple of questions on this FAQ about Forth:

http://sunsite.uakom.sk/sunworldonline/swol-10-1995/swol-10-...

Not as much nowadays as back when embedded computers were severely memory-limited. In those days, you could use Forth and get a semi-high-level, interactive REPL language with a resident compiler in a few K of memory. Now you would usually just use C or Python because memories are huge.
While we're on the subject, Forth is interesting because as a stack-based language, it forces you to structure your programs as function compositions. This was not a popular idea in the 80s, but with the increasing popularity of functional programming, it's relevant again.
One more I forgot: Postscript (the full language of which PDF is a simplification). It's not Forth per se, but it's very Forth-like in much the same way Clojure is Lisp-like.
I know of at least 2 commercial implementations: VFX Forth by mpe[0] and SwiftForth by Forth Inc[1]. Both companies are still in business, selling their tools for hundreds to thousands of dollars, so there is obviously some people using them. I think they have a partial customer listing somewhere on their site. Oh, and there's also gforth of course.

Then again, part of the charm of forth is how easily it is to bootstrap your own forth on your own system. There could be countless homegrown forths out there, in addition to these commercial offerings.

I haven't used forth for anything in production, myself. I just think it's a really interesting language. It's a great "what if" question. What if everything built on C had been built on Forth instead? What if all the work in improving the C language, tooling, operating systems, libraries, hardware etc had been spent improving Forth and stacks instead? It's fun to ponder.

If you want to read more about forth, check out Thinking Forth[2]. It's really got some great insights about programming and software design in general, not just applicable to forth programming in particular.

Starting Forth[3] is also good more of a beginner's introduction to forth. And Moore's own Problem Oriented Language is a good book talking about what his intentions were when he created forth[4].

For more reading: This page has a lot of good forth links too [5] Chuck Moore's color forth blog[6] This site has lots of interviews with Chuck Moore on it and other forth opinion pieces[7] Forth Interest Group site[8]

There's also a new forth standard that's being worked on[9]. Seems pretty active, too.

[0] http://www.mpeforth.com/software/pc-systems/vfx-forth-for-wi...

[1] https://www.forth.com/swiftforth/

[2] http://thinking-forth.sourceforge.net/

[3] https://www.forth.com/starting-forth/0-starting-forth/

[4] http://www.forth.org/POL.pdf

[5] http://www.complang.tuwien.ac.at/projects/forth.html

[6] https://web.archive.org/web/20160305013837/http://www.colorf...

[7] http://www.ultratechnology.com/

[8] http://www.forth.org/

[9] https://forth-standard.org/

Forth's machine model has two stacks, one for return addresses and one for data, plus random-access memory. In some small Forth machines, these three memories were physically separate, and all three had an access on each machine cycle.
AFAIK Forth is a stack-based language, so most VMs are stack-based.

I might be wrong.

In any case I'm very interested to learn about what register-based Forth VMs are out there!