Hacker News new | ask | show | jobs
by vidarh 4666 days ago
The numbers for Ruby on Truffle are meaningless until people have had a chance to hit it hard with all the particular oddities of Ruby. Consider their 6 months was to hit 45% of RubySpec. You can reach 45% of RubySpec fairly easily if you go for the softest targets (I'm not saying that's what they've done - I haven't checked).

[EDIT: I see they're doing some interesting things that certainly ought to beat MRI. If I understand it correctly it seems like they are somehow collapsing type checks for multiple operations. Of course the devil is in the details - if they are trying to defer type or method checks, and throw away results if the checks fails (which should be rare) that will only be safe if modifications that does happen does not introduce or remove side effects that can't be "rolled back", but I might be misunderstanding their presentation]

The problem is the multitude of bizarre things that are legal Ruby. Like people doing eval("class Fixnum; def + other; 42; end; end;"). Yes, that's legal, and yes that means any integer arithmetic in your application is suddenly broken. More importantly it means any optimisations based on your beliefs about what any piece of code is meant to do, while they are most likely right, can turn out to be horribly wrong and so are problematic for a VM or compiler, without substantial amount of logic to be able to detect or bail out from optimised code to safe fallbacks. Doing so without slowing down the code when your guesses are right is hard because of how many ways there are of changing the behaviour of code in Ruby.

Unless your compiler understands eval() and it is possible for it to reason about the contents of the eval string, it can make pretty much zero guarantees about the state of the world after an eval() call, and so it can make pretty much zero guarantees about the state of the world after any method call that could reach such an eval() call.

Admittedly, that's a stupid thing to do, but it's legal in Ruby, and while the above example is extreme, you do find a lot of use that is roughly equivalent. E.g. autoload creates as much lack of predictability as eval. So does a 'require' or 'load' that might get triggered later in execution, for example.

The reason those are important is that it makes a massive amount of optimisations far harder: You can't blindly cache method pointers, for example, because any method call potentially invalidates them. You can't even cache class pointers, because they can change: You can return from a method call and suddenly an object has an eigenclass. You can't inline functions without guarding them somehow to fall back to the full method call when it turns out some idiot did redefine Fixnum#+. You can't assume seemingly "safe" stuff like Fixnum#+(some other Fixnum) will even return an object of the type you assume, for the same reason - someone might decide to implement a DSL that redefines it.

Frankly, it'd be fantastic to start deprecating some of the more obnoxious things like these, and weeding out the few uses of them, but as it stands today, a fast Ruby subset is "easy". A fast complete Ruby implementation is an entirely different beast. A fast incomplete Ruby implementation that refuses to support some of the most noxious corner cases would still be extremely useful for a lot of people, though.

(in the interest of disclosure since I'm talking about another Ruby implementation: I'm writing a series on my own slow process of writing a Ruby compiler, though my goals are very different - mostly focused on writing about the process)

2 comments

I'm the author of Ruby on Truffle.

I'll talk you through exactly how we solve the problem of redefining Fixnum, as one example of how we've tackled these problems.

Whenever you use Fixnum#+ in one of your methods, we lookup what that method is and cache the method so we can call it quickly next time. We actually never again check that this cache is still valid. The trick is that we sort of do the opposite - any time you do something that could invalidate that cache, we find the installed machine code that uses it, and delete it. If the machine code is still running somewhere on some stack for some thread or fibre, we jump from the machine code into an interpreted version which looks up the method again and carries on.

So Kernel#eval makes no difference - if something that you eval ruins your later cached method calls in the same method, that's not a problem because if you're still running the same machine code, then you can't have redefined Fixnum#+. If you had redefined it, you'd be back in the interpreter getting ready to compile again with new caches.

I'll also just point out that running RubySpec means we are successfully running something like 5000 lines of off-the-shelf unmodified systems code, just for the harness before we even get to the tests.

Our theory is that we can make Ruby very fast, without having to forgo any of your favourite random dynamic monkey-patching features.

Watch the video: http://medianetwork.oracle.com/video/player/2623645003001

Join us on the mailing list: http://mail.openjdk.java.net/mailman/listinfo/graal-dev

Would you mind providing some "grand order of things" hand-wavey estimate as to when exactly the public can expect to have Fast Ruby ™ © ®?

Also, will it the very same Ruby we all know, compatible with everything? i.e. will it be the Christmas I envision?

I'm afraid I can't - sorry. Keep an eye on the mailing list or follow me on twitter (@ChrisGSeaton) though.
Done, thanks. And good luck to you guys.
>Unless your compiler understands eval() and it is possible for it to reason about the contents of the eval string, it can make pretty much zero guarantees about the state of the world after an eval() call, and so it can make pretty much zero guarantees about the state of the world after any method call that could reach such an eval() call.

That's also true for Javascript, but it hasn't stopped three separate teams (Mozilla, Apple, Google) making it crazy faster than Ruby.

First of all "Ruby" is not an implementation. The performance differences between MRI and the different patched versions of it, MacRuby, MagLev, jRuby, Rubinius etc. can be substantial.

[edit: I referenced https://github.com/cogitator/ruby-implementations/wiki/List-... here, but then I read through the rest of the list, and too much of it is "junk", so I added the list of the more mature implementations above instead]

But the point is not that making speed improvements is impossible. I strongly believe Ruby can be made as fast as current JS implementation or better.

But there's a vast gap between being able to, and for performance numbers based on 45% of RubySpec to mean anything about what we can expect to see in terms of performance of this particular implementation.

I'm working on a Ruby compiler myself (woefully incomplete; certainly vastly faster than MRI on the tiny subset it can compile, but pointless to benchmark for exactly the reasons I stated: I have no way of telling how much performance it'll lose to deal with method invalidation etc.), and I absolutely love that more people try to implement Ruby and it'd be fantastic if these performance gains stay as they flesh it out.

As someone else pointed out: Several Ruby implementation started out with impressive numbers on some subset of the language. Then they slowed down more and more as code was added to handle the corner cases of the language.

Maybe these guys will do better. Maybe they won't. The current benchmark does not tell us either way.

>First of all "Ruby" is not an implementation.

Sure, but I was obviously reffering to MRI. Not to mention that, back in my day, Ruby WAS an implementation.

>But there's a vast gap between being able to, and for performance numbers based on 45% of RubySpec to mean anything about what we can expect to see in terms of performance of this particular implementation.

Sure I agree.

>As someone else pointed out: Several Ruby implementation started out with impressive numbers on some subset of the language. Then they slowed down more and more as code was added to handle the corner cases of the language.

I wonder then, if all that talent is not better served by implementing a BETTER version of the core 50% or 70% Ruby language as a new language, getting rid of edge cases and garbage. Perhaps add some good stuff from Python in for good measure.

Matter of fact, didn't Matz do something like that, with an embedded-oriented Ruby like language recently?

> I wonder then, if all that talent is not better served by implementing a BETTER version of the core 50% or 70% Ruby language as a new language, getting rid of edge cases and garbage.

No, if you want languages that are less expressive but faster, there are plenty of options available. The thing is, MRI itself keeps getting faster implementing the whole of Ruby. The fact that new Ruby implementations that implement the easiest bits first tend to start out much faster than the mainline interpreter on code that only uses those most-straightforward-to-implement bits but then tends to converge closer to the speed of the mainline interpreter as it the implementation gets more complete doesn't mean that it would be better to make a new language. Much of the experience of those alternative interpreters is relevant to making improvements in the mainline interpreter that speed up, or otherwise improve, "complete" Ruby (including times when the alternative interpreter becomes the mainline interpreter, as occurred with YARV, which was an alternative interpreter before it became the mainline interpreter in 1.9.)

> Matter of fact, didn't Matz do something like that, with an embedded-oriented Ruby like language recently?

If you are referring to mruby, that's an embedded-oriented Ruby implementation, not a Ruby-like language.

Losing too much of Ruby would make it pointless. But there are certainly edge cases that we could lose readily and not care very much.

When was the last time you saw anyone redefine operators on Fixnum for a good reason, for example?

My pet peeve is small things like freezing many of the base classes by default, for example. As well as a "meta programming module" in the standard library that'd gather up as much as possible of what people (ab)use eval() for today in specific, narrowly tailored methods that implementations could provide specialised versions of. A limited "general purpose" eval() that is not allowed to modify the class hierarchy would be another good thing - anything that'd combine to allow implementations to defer type and method cache guards and invalidations as long as possible would make it far easier to improve performance, with very little impact on most developers.

> When was the last time you saw anyone redefine operators on Fixnum for a good reason, for example?

Special-case restrictions like the ones that would be necessary to prevent this is increasing, not decreasing, the overall complexity of the language, even if it decreases the complexity of the implementation. That makes it harder to keep a mental grasp on the language.

> As well as a "meta programming module" in the standard library that'd gather up as much as possible of what people (ab)use eval() for today in specific, narrowly tailored methods that implementations could provide specialised versions of.

Rather than such a module, that's actually been the normal evolution of the core is to include, in the appropriate places (usually as methods on Object, Module, or Class) methods that capture the common use cases of eval. But those uses evolve, and removing general purpose eval would both limit the flexibility of the language and limit the signal that eval usage patterns provide for the future development of the language.

> A limited "general purpose" eval() that is not allowed to modify the class hierarchy would be another good thing

An eval that interprets a different language that is almost like the Ruby that the implementation it is hosted in evaluates would be yet another layer of complexity in the language.

> A limited "general purpose" eval() that is not allowed to modify the class hierarchy would be another good thing - anything that'd combine to allow implementations to defer type and method cache guards and invalidations as long as possible would make it far easier to improve performance, with very little impact on most developers.

While the developers that directly use the impacted features might be a small number, the developers that use software that uses the impacted features under the hood would be much bigger.

There are plenty of languages that aim to performant and the cost of expressiveness. There's no reason for Ruby to change to compete in that space.

>No, if you want languages that are less expressive but faster, there are plenty of options available.

I'm not sure there are in the style we're talking about. Only Lua comes to mind. Maybe I'd add Julia there too.

A modern Python/Ruby replacement, built for speed and with a large-ish community would be nice to see. Even with static inferred typing.

It's not like we can't have new languages anymore. After all both Ruby/Python came out of nowhere around 1992-4, a time where there was no modern web and even less resources to grow a language.

> A modern Python/Ruby replacement, built for speed and with a large-ish community would be nice to see.

Both Python and Ruby have performance as main areas of focus for improvement, having largely met their goals in terms of expressiveness. So, to a large extent, that's what each new version of Python and Ruby already is.

If you mean "built for speed first", then there are plenty of those (though they aren't really expressly aimed as Python/Ruby replacements, because "built for speed first" isn't Python or Ruby's focus, so something built that way isn't really targetting Python or Ruby, even though it may be targeting some subset of the places where Python and Ruby are currently applied.)

Many new languages fit this niche (langauges designed with performance as a key focus that target some subset of the use cases of Python/Ruby.) But for the most part, they aren't very Python/Ruby-like, because the difference in goals leads to much bigger changes than slicing off features of Python/Ruby.

> It's not like we can't have new languages anymore.

No, its like we have lots of new languages, as well as lots of existing languages, and lots of use for improved versions of existing languages, so it doesn't make a lot of sense for people who aren't the people involved to say that people currently working on new Ruby implementations should stop working on them and instead work on new "Ruby-like" languages with reduced features that fit niches that aren't what Ruby is targetted at, but are what other existing and new languages are already targetted at.

Both because there are plenty of people already working on what you want, and because "people should stop working on what is important to them on spend their time working on what is important to me" is a generally silly when you aren't paying the people in question for their time/effort.

> After all both Ruby/Python came out of nowhere around 1992-4

If you are talking first public release, that's "1991-1995", if you are talking 1.0 release that's "1994-1996" (in both cases, Python and then Ruby.)

The Ruby developer in me wants to have his lunch and eat it.

I'm sure there's a way to have the crazy whole that is Ruby AND the speed. Dammit, Javascript is fast now and it's not like it was more optim-ready than Ruby is to begin with.

> I wonder then, if all that talent is not better served by implementing a BETTER version of the core 50% or 70% Ruby language as a new language, getting rid of edge cases and garbage.

I would very much like to see that. But I think we first need more competition to MRI.

E.g. note the proposal for "refinements" for Ruby 2.1 which is an implementation nightmare for anyone that wants a fast implementation. See Charles Nutter's (of jRuby fame) comments about it, for example. Since MRI is as slow as it is, there's little incentive to keep features like that out - it won't kill MRI performance.

It's also hard to determine exactly what the viable subset should be, since most those of us who would like a faster Ruby also love Ruby a lot because of how dynamic it is, when it is used right, and jump ship if the language lost too much flexibility in the quest for that performance.

Part of my own motivation for writing about writing a Ruby compiler is exploring what parts of Ruby can be implemented efficiently because frankly it's hard to even guess exactly how an "efficient Ruby" should look.

There are some obvious problems for implementations that want to boost performance, like too much reliance on eval() and defining eigenclasses on objects, as well as autoload, require and include, all of which can worst case trash all method caching and optimisations all over the place.

But throwing out all of that would be brutal, especially given common Ruby patterns likes dynamically require'ing everything in a directory at application startup, which should be fine, vs. a "require" occurring later in execution. And it's not clear that there aren't other common patterns that'll cause a lot of pain.

I think we will see at least implementations with options to disable support for certain things, or with support to let applications declare "from now on, no shenanigans" to let the implementation take shortcuts, would be very helpful.

There's a lot of things developers could do themselves, that would let even a compliant implementation speed up. E.g. call #freeze on all classes you have no intention of modifying somewhere that is easily identified by relatively superficial analysis would make a massive difference (suddenly you can cache lots of extra methods, and even inline and unroll things like "each" loops in many cases that'd otherwise be an expensive flurry of method calls).

Other things are about developer practice: Freeze all objects you don't want to modify ASAP on creation. A good implementation could make good use of that too. But today there is no incentive for Ruby users to write Ruby that is amenable to fast execution because the implementations don't take advantage of it.

Once there is an implementation that makes the advantages of writing Ruby to a subset that is more amenable to fast execution, then I think it'd be possible to get traction for deprecating and removing features that are a performance nightmare.

> Matter of fact, didn't Matz do something like that, with an embedded-oriented Ruby like language recently?

Sort of, though mruby appears to focus on size and ability to embed rather than specifically picking a subset that's amenable to a fast implementation. It's still a bytecode interpreter, for example, and it's not even aiming for complying with the (already limited) ISO standard for Ruby.