Hacker News new | ask | show | jobs
by Scaevolus 4764 days ago
I've been looking for a DynASM tutorial for ages, and wanted to use it to make a fast BF interpreter-- this is great!

I added some optimizations using Flex to identify patterns, and got a 3.8x speedup on the Mandelbrot benchmark -- https://github.com/rmmh/beefit

1 comments

Very nice! I wouldn't have thought to use flex to recognize optimizable patterns. :) It crashes for me on OS X but I'll have to try it on Linux.

You may have tried this already, but where you have:

    mov al, byte [PTR]
Usually you want to write this instead to avoid a partial register stall (also in case there's junk sitting in the register).

    movzx  eax, byte [PTR]
Hm, the OSX crash is probably because they have a different ABI from Linux -- Google "osx ia32 abi" for details-- I'm probably violating stack alignment.

Partial register stalls are where you write part of the register, then read the full register:

    mov al, byte [PTR]
    add ebx, eax
But I am emitting this:

    mov al, byte [PTR]
    add byte [PTR+3], al
So there's no stall.

I got jit4 about as far as is reasonable for a single-pass compiler, but I intend to implement a proper BF->IR->ASM compiler to implement some more complex multi-pass optimizations I have in mind.

Yes, I have encountered the problem before that OS X crashes on incorrect stack alignment even where other platforms tolerate it.

Good point about the stall -- I always thought that partial register stall happened at the point that you do the partial write, since the logical contents of the register now depend on its previous contents. I didn't realize that the dependency logic was sophisticated enough to allow the partial read without depending on the entire register's value.