|
|
|
|
|
by tenderlove
1202 days ago
|
|
I'll take a stab at this. YARV (Ruby's VM) is already direct threaded (using computed gotos), so there's no dispatch loop to eliminate. YARV is a stack based virtual machine, and the machine code that YJIT generates writes temporary values to the VM stack. In other words, it always spills temporaries to memory. We're actively working on keeping things in registers rather than spilling. Ruby programs tend to be extremely polymorphic. It's not uncommon to see call sites with hundreds of different classes (and now that we've implemented object shapes, hundreds of object shapes). YJIT is not currently splitting or inlining, so we unfortunately encounter megamorphic sites more frequently than we'd like. I'm sure there's more stuff but I hope this helps! |
|
I've seen different people mean different things by this, do you mean the IR is a list of bytecode handler addresses, and then the end of every handler is a load+indirect jump? Or is there also a dispatch table? In my experience the duplication of the dispatch sequence (i.e. no dispatch "loop") is worth 10-40% and then eliminating the dispatch table on top of that a bit more.
CPUs work hard to predict indirect branches these days, but the BTB is only so big. Getting rid of any indirect call or jump, regardless if that is through a dispatch table, is a big win, perhaps 2-3x, because CPUs have enormous reorder buffers now and can really load a ton of code if branch prediction is good, which it won't be for any large program with pervasive indirect jumps.
> it always spills temporaries to memory. We're actively working on keeping things in registers rather than spilling.
In my experience that can be a 2x-4x performance win.
> It's not uncommon to see call sites with hundreds of different classes
Sure, the question is always about the dynamic frequency of such call sites. What kind of ICs does YARV use? Are monomorphic calls inlined?