Hacker News new | ask | show | jobs
by hiyer 788 days ago
On my machine, the YJIT version of the original code is only ~30% faster than the non-YJIT version

    ~/scripts > ruby fib.rb                                                                                                                                                                                                                                                                     
    2.3346780000720173
    ~/scripts > ruby --yjit fib.rb                                                                                                                                                                                                                                                           
    1.5913339999970049
So looks like YJIT doesn't "know" about this optimization
1 comments

Ah thanks so much for trying it out. Interesting that it couldn't figure out that code path
I think it's just a matter of time. YJIT is still fairly young and doesn't do extensive inlining at the moment. If it did inline the block it could see the array is unused and avoid the allocation.

Running the original fib benchmark (i.e., without the author's technique to eliminate the array allocation) on an M1 Pro, I see:

  CRuby 3.3.1:
  2.058589000022039

  CRuby 3.3.1 w/ YJIT:
  1.4314430000958964

  TruffleRuby 24.0.1 (Native):
  0.20155820800573565

  TruffleRuby 24.0.1 (JVM):
  0.1336908749944996
I took the best time out of three for each implementation, but there wasn't that much variance over all. Standard caveats about benchmarking on an actively used laptop apply.

Running the new prime_counter benchmark that the crystalruby author mentions in another thread¹, I see:

  Crystal 1.12.1 (LLVM 18.1.4) w/ crystalruby 0.2.0 in CRuby 3.3.1:
  0.34096299996599555

  CRuby 3.3.1:
  2.9615250000497326

  CRuby 3.3.1 w/ YJIT:
  1.640430000028573

  TruffleRuby 24.0.1 (Native):
  0.2504862080095336

  TruffleRuby 24.0.1 (JVM):
  0.25282600001082756
YJIT and TruffleRuby make different trade-offs, so I'm not trying to say the latter is necessarily better. But, I think the TruffleRuby numbers show what are possible in terms of Ruby optimization. Unfortunately, there's currently an issue in TruffleRuby with one of the crystalruby gem's dependencies³, so I had to extract the Ruby benchmark out to a separate file. incompatibility.

¹ -- https://news.ycombinator.com/item?id=40153218

² -- The method_source gem used by crystalruby catches exceptions and matches against the message² for some conditional handling. TruffleRuby 24.0 now uses to Prism as its parser and Prism has an exception message with slightly different wording from CRuby. Consequently, method_source's handling doesn't work with Prism. It's hard to say where the compatibility issue lies, since exception messages aren't stable APIs. We'll get it sorted out.

³ -- https://github.com/banister/method_source/blob/06f21c66380c6...

> TruffleRuby 24.0.1 (JVM): > 0.1336908749944996

That's impressive numbers for running the unoptimized code. I might give TruffleRuby a shot!