Hacker News new | ask | show | jobs
by david-given 3814 days ago
Years ago I wrote a compiler (for bytecode) in awk. Ummm... http://cowlark.com/mercat, although you'll need to wade through zip files to get at the source. It was 1.6kloc for a fully typed algolish language producing stack-based bytecode.

awk's a lovely little language, and deserves to be better known than it is. Its two big failings are local variable syntax and absence of structured types... and the standard library's a bit mad in places (gsub, sigh). But it's expressive and concise and still readable, and meets its core competency of doing easy text processing beautifully.

The most recent thing I wrote in it was this:

https://github.com/EtchedPixels/FUZIX/blob/master/Applicatio...

That's a C file which is also an executable shell script which contains an embedded awk script. The whole thing's a Forth interpreter. Running the file uses the awk script to compile a Forth subset into bytecode and patch the source file with the new bytecode, which allows me to keep the whole thing in a single source file. It's not what I'd call good awk, but it's incredibly effective awk...

2 comments

It looks like you fished the awk file out - I found http://cowlark.com/mercat/com.awk.txt linked directly on that page, which, at 1610 total lines (1518 SLOC counting commented-out code), sounds like exactly what you're referring to.

As for fforth.... your signoff at the end of the comments sums it up much better than anything I could say.

  # No evil was harmed in the making of this file. Probably.
This thing is absolutely awesome... a self-modifying tri-language source file, implementing Forth in just 22KB (or 34KB on x64). Very nice.

Now to go read the, um,

  panic: unrecognised word: help
...documentation? :P

It actually happens that I've recently become really interested in Forth implementations and systems, so discovering this is especially cool... and on that note, what sources would you recommend I study to get an overview of Forth history and development? I've read enough historical anecdotes to understand there are conflicting opinions (as always), but nothing thus far has shown the evolution of the language itself, how ANS became a thing, and so forth.

PS. clang-3.7 -Os is the winner on i386, gcc-5.3 -Os on x64. tcc-0.9.26, interestingly, comes second on both (26KB and 36KB respectively). (Using Slackware-current.)

PPS. Your site's About section might want to know the Antix website seems to have been taken over by a spam system.

That is brilliant!

So the embedded FORTH compiler written in AWK reads the FORTH code in a comment like this:

  //@C SPACES
  // \ n --
  //   BEGIN
  //     DUP 0>
  //   WHILE
  //     SPACE 1-
  //   REPEAT
  //   DROP
and compiles it into C code like this (reformatted here to help illustrate):

  COM(
      spaces_word, codeword, "SPACES", &space_word,
          (void*)&dup_word, (void*)&more0_word,
      (void*)&branch0_word, (void*)(&spaces_word.payload[0] + 8),
          (void*)&space_word, (void*)&sub_one_word,
      (void*)&branch_word, (void*)(&spaces_word.payload[0] + 0),
      (void*)&drop_word,
      (void*)&exit_word
  )
Yup! COM() is a varargs macro that actually assembles the data in memory --- the actual word layout is not the traditional one Forth uses (to make it C friendly). But the end result is a linked list of Forth words in exactly the same format that user words have, which the user dictionary extends.

It all means that the C source can just be compiled in a single step --- gcc -o fforth fforth.c --- without needing a precompilation stage, which makes it vastly easier to manage.

It's even portable!