Replacing math.mod(n, i) with (n % i) gives roughly 9.4x performance.
EDIT: luajit version was LuaJIT 2.0.4 on Mac OSX