On OSX 10.1 with luajit 2.1.0 alpha, a straight translation of the C code gets me 59 seconds on my machine which is close to the C speed on it (38 seconds compiled with O3)
Lua is global by default. Declare all the variables as local and you'll see significant improvement. Also, there is a boolean type so you can use true and false directly instead of comparing numbers.
If you see anything wrong with it, or odd, feel free to share.
I'm still investigating what is happening to make the run so slow, so if you can find something wrong in my code, that would help.