Hacker News new | ask | show | jobs
by Rochus 696 days ago
> Pallene beats C when the code uses many Lua data structures.

How can it beat C if it just transpiles to C? And accessing string named fields in a table is still done via hashing, even in Pallene, isn't it?

> Also, rewriting from Lua to Pallene is much less work than rewriting it in C.

Staying in LuaJIT is even less work.

3 comments

The difference is the Lua-C API. The default Lua-C API is designed for humans: it is stable and safe to use, but every operation must pay the cost of function calls and passing data through the Lua stack. Pallene bypasses the API and reaches into the Lua tables directly. This is much faster, but would be impractical without the Pallene compiler. The internal struct layouts are unstable, and unsafe if you're not very careful.
> How can it beat C

It doesn’t have the Lua to C interop overhead. You can obviously ameliorate that overhead by working on batches in C, but if you have a large and complicated dataset in Lua and need to iterate through it in C, the overhead is constantly additive so it’s certainly not just “the performance of C” when you step into C, necessarily.

If on the other hand you’re dropping into C to do something like decode a compressed stream, then the interop overhead is negligible compared to the work done in C. However, that interop overhead will be present wherever you put the boundary layer....

> Staying in LuaJIT is even less work.

maybe! tracking down unexpected performance regressions is more work than correcting type errors reported by compiler errors, and your luajit results suggest that typically a c subroutine (and perhaps consequently a pallene subroutine) will enjoy a 4× speed advantage over the luajit version, which might save you a lot of optimization work elsewhere

> tracking down unexpected performance regressions is more work

In particular you have to know a lot of technical details about LuaJIT, which essentially contradicts the benefit of using Lua in the first place. In my case I used LuaJIT as a backend, directly generating bytecode considering LuaJIT internals as far as possible. But it's still much slower than e.g. Mono.

> your luajit results suggest that typically a c subroutine (and perhaps consequently a pallene subroutine) will enjoy a 4× speed advantage

The Lua implementation of Are-we-fast-yet apparently doesn't consider LuaJIT internals, but is just pretty ideomatic Lua not optimized for speed. So it doesn't represent what is possible with LuaJIT performance-wise, but gives a good impression how LuaJIT performs in general applications.