Hacker News new | ask | show | jobs
by etfb 4516 days ago
In Forth, the following are fairly common definitions in the core:

    0 CONSTANT 0
    1 CONSTANT 1
    -1 CONSTANT -1
In an Algolian language like C++ or Java, if the syntax were legal, these would be:

    const 0 = 0;
    const 1 = 1;
    const -1 = -1;
The reason for this is to do with the interpreter. How the Forth interpreter works is very simple:

1. Read one word, where "word" is literally defined as "a sequence of non-space characters delimited by spaces". Some standard words include DUP 2DROP 1+ - . " and :

2. Attempt to find the word in the dictionary. If found, execute it.

3. If it's not found, attempt to interpret it as a number. If that works, push the number on the stack.

4. If neither 2 nor 3 succeed, emit an error message.

So the reason for defining a constant named "0" is to save time: it's quicker to interpret the word "0" if it's in the dictionary; otherwise you have to search the whole dictionary and then do a text-to-number conversion, which takes too long.

So really it does make sense.

2 comments

It is about space, not time. What you describe happens interactively. At that time, performance does not matter much, after compiling, it does.

At compilation time (the word : is one of the ways to switch the system to compilation mode), Forth doesn't execute words and numbers if finds, but it adds their addresses (in some form, depending on the implementation) to the generated code. When running the code later, the runtime simply fetches each of these function addresses and 'calls' them (that's what makes Forth systems fast without the need for any convoluted compiler technology. Depending on the implementation, that 'call' may be an actual call in the CPU, but it typically is just another 'grab the addresses found there in sequence and execute them' and yes, it can't be turtles all the way down, but Forth gets awfully close)

Now, the question is: how do you compile "push this constant on the stack"? Forth does it by compiling the word called LITERAL, followed by the constant.

So, compiling a function called 0 or 1 adds a function address to the compiler output, but compiling a literal constant adds the address of the function called LITERAL and that constant. For constants used more than a few (where the exact limit depends on he particular Forth implementation) times, the extra space needed for that function gets more than compensated by the gain. In typical Forth systems, -1, 0, 1, and 2 already are space savers before the user types his first character. That's why they are predefined. If you use another constant often, you can easily define it. Many Forth systems even have a function called CONSTANT for it, but you can define it yourself, if it is absent.

For those wondering how the system knows that that constant it compiled isn't the address of a function to call: it doesn't. Instead, the function called LITERAL, when called, hooks into the runtime, uses the 'current instruction pointer' to read the value to push on he stack, and then increases that pointer to point past the constant. When LITERAL returns, the runtime just reads what it thinks is the next pointer and calls it.

This also depends on the flavor of FORTH you're using. The one I wrote in high school (oh so many years ago) compiled directly to executable code, with constants compiled to direct pushes, so "redefining" a constant this way would run more slowly.

While I absolutely hate FORTH as a production language (I've seen millions of dollars flushed down the tubes by FORTH afficianados who were unwilling to admit that their code was unmaintainable, slow, and didn't work), it's fun to play around with. Everybody should write at least one FORTH in their career.