Hacker News new | ask | show | jobs
by wrs 789 days ago
I don’t think there’s any language, interpreted or otherwise, with the goal that knowing its internals won’t help you gain more performance. I mean, that would be nearly impossible.

Ruby code, perhaps more than most code, is written for readability and “beauty”. It’s a part of Ruby culture that I greatly appreciate. But if you care about performance, you will act differently, regardless of language. And the whole point of this code is to show that if you care about performance above all else, there’s of plenty of room to maneuver in interpreted Ruby.

2 comments

That's an interesting question, how does the YJIT perform using the original code? Does it find the optimizations that it results in the same gain such that you don't actually need to personally know the optimization?
On my machine, the YJIT version of the original code is only ~30% faster than the non-YJIT version

    ~/scripts > ruby fib.rb                                                                                                                                                                                                                                                                     
    2.3346780000720173
    ~/scripts > ruby --yjit fib.rb                                                                                                                                                                                                                                                           
    1.5913339999970049
So looks like YJIT doesn't "know" about this optimization
Ah thanks so much for trying it out. Interesting that it couldn't figure out that code path
I think it's just a matter of time. YJIT is still fairly young and doesn't do extensive inlining at the moment. If it did inline the block it could see the array is unused and avoid the allocation.

Running the original fib benchmark (i.e., without the author's technique to eliminate the array allocation) on an M1 Pro, I see:

  CRuby 3.3.1:
  2.058589000022039

  CRuby 3.3.1 w/ YJIT:
  1.4314430000958964

  TruffleRuby 24.0.1 (Native):
  0.20155820800573565

  TruffleRuby 24.0.1 (JVM):
  0.1336908749944996
I took the best time out of three for each implementation, but there wasn't that much variance over all. Standard caveats about benchmarking on an actively used laptop apply.

Running the new prime_counter benchmark that the crystalruby author mentions in another thread¹, I see:

  Crystal 1.12.1 (LLVM 18.1.4) w/ crystalruby 0.2.0 in CRuby 3.3.1:
  0.34096299996599555

  CRuby 3.3.1:
  2.9615250000497326

  CRuby 3.3.1 w/ YJIT:
  1.640430000028573

  TruffleRuby 24.0.1 (Native):
  0.2504862080095336

  TruffleRuby 24.0.1 (JVM):
  0.25282600001082756
YJIT and TruffleRuby make different trade-offs, so I'm not trying to say the latter is necessarily better. But, I think the TruffleRuby numbers show what are possible in terms of Ruby optimization. Unfortunately, there's currently an issue in TruffleRuby with one of the crystalruby gem's dependencies³, so I had to extract the Ruby benchmark out to a separate file. incompatibility.

¹ -- https://news.ycombinator.com/item?id=40153218

² -- The method_source gem used by crystalruby catches exceptions and matches against the message² for some conditional handling. TruffleRuby 24.0 now uses to Prism as its parser and Prism has an exception message with slightly different wording from CRuby. Consequently, method_source's handling doesn't work with Prism. It's hard to say where the compatibility issue lies, since exception messages aren't stable APIs. We'll get it sorted out.

³ -- https://github.com/banister/method_source/blob/06f21c66380c6...

> TruffleRuby 24.0.1 (JVM): > 0.1336908749944996

That's impressive numbers for running the unoptimized code. I might give TruffleRuby a shot!

>I don’t think there’s any language, interpreted or otherwise, with the goal that knowing its internals won’t help you gain more performance.

It is, indeed, a fundamental goal of ruby that there are multiple ways to write the same thing, and that the programmer should not need to understand nuances of the compiler.

"I need to guess how the compiler works. If I'm right, and I'm smart enough, it's no problem. But if I'm not smart enough, and I'm really not, it causes confusion. The result will be unexpected for an ordinary person. This is an example of how orthogonality is bad." -matz 2003.

That's about semantics, not performance. I don't think Matz would say that a goal of Ruby is to prevent you from improving performance using whatever knowledge you do have about the compiler.

In this particular example, the fact that assignments return the value of the right-hand side is well-known and used frequently in Ruby code. The fact that arrays have to be allocated is obvious. The fact that allocations have a runtime cost is obvious. The only thing that isn't obvious is that the return value allocation of assignments whose value isn't used are optimized away. If you know that, you'll think of appending the nil to activate that optimization. Characterizing the lack of that step as a "mistake" only makes sense if the goal for your code is to maximize performance -- which in this case, most unusually for Ruby, it was.