| There is an aspect of the history of Forth and C I have been trying to wrap my head around. The early B compiler was reported to generate threaded code (like Forth). The threaded code was abandoned fairly early in the port to the PDP11 from the PDP7 as it was deemed to slow to write an operating system in. At which point unix and C lost a very interesting size optimization. With the net result that Forth was more portable and portable between machines circa 1970 and Unix had to wait until circa 1975 with UnixV6. I have been trying to go back through the history to see if I could understand why threaded code was deemed too slow. Today the most part of executables is code that is not run often and would probably benefit from a smaller representation (less memory and cache space making the system faster overall). So this is a practical question even today. I found a copy of unix for the PDP7. Reproduced from old printouts typed in. If I have read the assembly correctly the B compiler was not using an efficient form of threaded code at all. The PDP7 is an interesting machine. It's cells were 18 bits wide. The adress bus was 12bits wide. Which meant there was room for an opcode and a full address in every cell. As I read the B compiler it was using a form of token threading with everything packed into a single 18 bit cell. The basic operations of B were tokens and an if token encoded with a full address in the cell. Every token had to be decoded via a jump table, the address of the target code was then plugged into a jump instruction which was immediately run. Given the width of the cells, I wonder what the conclusions about performance of B would have been if subroutine threading or a similar technique using jmp instructions would have been. Does anyone know if Forth suffers measurably in inner loops from have to call words that perform basic operations? Is this where a Forth programmer would be accustomed to write the inner loop in assembly to avoid the performance penalty? |
Which was still incredibly fast for the day, given that Forth was compiled to an intermediary format with the Forth interpreter acting as a very primitive virtual machine. This interpretation step had considerable overhead, especially in inner loops with few instructions the overhead would be massive. For every one instruction doing actual work you'd have a whole slew of them assigned to bookkeeping and stack management. What in C would compile to a few machine instructions (which a competent assembly programmer of the time would be able to significantly improve upon) would result in endless calls to lower and lower levels.
There were later Forth implementations that improved on this by compiling to native code but I never had access to those when I was still doing this.
For a lark I wrote a Forth in C rather than bootstrapping it through assembly and it performed quite well, Forth is ridiculously easy to bring up, it is essentially a few afternoons work to go from zero to highway speeds on a brand new board that you have a compiler for. Which is one of the reasons it is still a favorite for initial board bring-up.
One area where Forth usually beat out C by a comfortable margin was code size, Forth code tends to be extremely compact (and devoid of any luxury). On even the smallest micro controllers (8051 for instance, and later, MicroChip and such) you could get real work done in Forth.