Matz mentioned off hand at the last Ruby Conf that Nobu has been experimenting without the GIL. It's scary because running code without the GIL will expose a lot of existing concurrency bugs that have been safely hidden thus far. A lot of code works fine with the GIL, since it prevents the full parallelism that would exercise all possible race conditions, so it's unlikely they'd remove it in anything other than a major release, i.e. Ruby 3.0
An other issue, the one CPython has hit repeatedly when trying to remove the GIL — and the big sticking point for the core team in general — is that single-threaded performances of a naive interpreter suffer a lot when splitting the GIL: instead of a single lock which is taken and released relatively rarely, the interpreter has to juggle with many small locks which it has to acquire and release much more often. When David Beazley looked back on the GIL-less patch for CPython 1.4[0] he measured performance hits of 4x~7x.
Though in the case of CPython most of the cost comes from the refcounting. MRI is not refcounted so the performance hit of finer locking would probably be lower.
This is where JRuby / Truffle is exciting - JRuby's been exciting for a long time and never quite reached it's potential, but with these latest projects it could become the main ruby implementation - and these complex issues are already largely solved in the JVM.
I really doubt jruby will ever become the main ruby implementation, as awesome as it is. The fact is that too much of the ruby ecosystem is built on calling out to C, and jruby officially abandoned cext shimming a while back. And while ffi is good, it's often not nearly as flexible.
There's also the fact that because of slow startup times, the modify/test cycle can be quite frustrating compared to working with MRI.
The truth is that the vast majority of ruby applications never reach a point where the steady-state post-warmup performance of jruby with pure ruby outweighs the fast-startup moderate performance of mri+cexts. And ones that do reach that point often have a better case for moving to something else anyways (including other jvm technologies that can now be shimmed on via jruby and eventually replace the whole thing).
Truffle can reach faster performance in pure-ruby than MRI+cext. Agreed though, it'll be a uphill battle to change the interpreter of choice, and JRuby will have to have much faster startup to get it to work
Creating MRI/CPython/Perl5-style VM without GIL is not much harder than creating the same thing with GIL. But the problem is that converting existing implementation to not use GIL is comparable amount of work.