Hacker News new | ask | show | jobs
by qikInNdOutReply 1311 days ago
I work on a game, that mostly uses lua as logic code, saddling on a c/c++ engine. One of the engine developers implemented lua jit years ago and found that for the actual performance, the interpreter/jit was neglibable, the most expensive thing being the actual switch between luacode and c-code, which is with a large API a constant ugly background performance loss.

The lua code by itself did not ran long enough to actually profit much from optimizations much.

So, back then we did discuss possible solutions, and one idea was to basically notice a upcoming C-Call in bytecode ahead of execution, detect the stability of the arguments ahead of time. A background thread, extracts the values, perform the Call-processing of arguments and return pushes the values onto the stack, finally setting a "valid" bit to unblock the c-call (which by then actually is no longer a call). Both sides never have a complete cache eviction and live happily ever after. Unfortunatly i have a game dev addiction, so nothing ever came of it.

But similar minds might have pushed similar ideas.. so asking the hive for this jive. Anyone ever seen something similar in the wild?

6 comments

I actually measured it. It was around 65-85ns for Lua and 55ns for LuaJIT (on a 7950X). It adds up every time you make a call, despite looking like a small value. Of course, you can multiply it by 2-5x when you put it in production.A decent emulator these days should have < 20ns time to first instruction. The worst emulators are the ones that require you to host them in another thread for the purpose of execution timeout using some kind of timed-wait (eg. futex op), adding a massive 10-20 micros overhead just to communicate tasks.

Ultimately the script in a game engine either makes a bunch of API calls, so we want the "system call overhead" to be that of an opaque function call (4ns), and sometimes you share memory in order to read out properties and put abstractions on them (probably not relevant to Lua). So, it's important to get going fast, and to be able to make hundreds of engine API calls without unnecessary costs.

Benchmarks: https://gist.github.com/fwsGonzo/2f4518b66b147ee657d64496811...

Sorry if my comment is naive or ignorant, but have you tried using LuaJIT FFI instead of traditional int (lua_State *) API? Or is that with FFI already?

I thought FFI promised almost seamless Lua->C calls. And can even pass Lua functions into C with native signatures (but this bridge is still claimed expensive).

I'm not really answering your question here but answering more in general regarding calls to functions bound from the lua c API.

There is trace stiching meant to help with code that does a lot of c calls. This was disabled in 2015 April 28 due to it being flawed but enabled again August 29 later that year with a working design.

Otherwise the recommendations are to reduce and maybe cache values on the lua side of things if you cannot use ffi.

Otherwise as mentioned in other comments if you can use ffi, use it. The performance benefit is significant.

The downside is that the ffi API is unsafe so in the context of a game development you'd have to be careful about third party scripts from modding or similar.

Another downside is if you cannot use jit (like on iOS) the performance hit can be worse than using the lua c api

I believe LuaJIT also has a C FFI that should allow for faster function calls compared to the native Lua API.
In CPUs they call this speculative execution. It's a good idea if you can detect side effect free code (and don't have the sandboxing issues that caused problems on CPUs).
Fatshark?