Hacker News new | ask | show | jobs
by _hardwaregeek 2617 days ago
Very cool! I didn't think this would happen, as Matz has expressed disinterest in adding type annotations. However, keeping an open mind and reconsidering one's positions are the hallmarks of a great leader :D

I worked on a summer project to add type annotations to Ruby. Didn't get very far since I ran into some challenges with the internals of the parser and the parser library, Ripper. I'm extremely interested in seeing how the Ruby team designs the type system. It'll be gradual, of course, but also it'll be interesting what adaptations they'll have to make to accommodate existing code. JavaScript relied on a lot of stringly typed code, so TypeScript added string literal types. Perhaps Ruby's dynamic, block oriented style could lead to some interesting decisions in the type system.

Not to mention, the types will most likely be reified as per Ruby's philosophy.

Super excited for this. Between the JIT and types, Ruby could definitely see a renaissance in the near future.

3 comments

Indeed Sorbet does have literal types for strings and Ruby symbols. We're still figuring out the details and converging on a common type system for Ruby3, but we've found them super useful, as you rightly point out!

And +1 on the Ruby renaissance! Super excited about all the exciting things that are currently being built!

I can't wait for Sorbet's open sourcing! Ngl I tried decompiling the wasm binary just for fun. Not that it ended up being readable haha
We're currently looking for beta users! Reach out to us at sorbet@stripe.com. If you describe your team&codebase in the email, it will help us figure out what cohort to include you in.
I honestly think that more than Matz reconsidering his own opinions, it probably turned out that having types is an instrumental thing to enable performance improvements.

Keep in mind, Ruby development is headed towards a goal that the dev team has called "3x3" as in Ruby 3 aims to be three times faster than current Ruby implementation.

My recollection is that 3x3 is a goal to be 3x faster than Ruby 2.0–presumably many of those gains have already been realized, so best not to depend on tripling _current_ performance.
Yeah, the NES emulator that's one of the benchmarks is about 1.8x faster so far, so only another 1.6x to go on top of that.
>it probably turned out that having types is an instrumental thing to enable performance improvements.

I was disappointed to find out that adding more types in Perl6 actually slows down performance.

I wonder what the differences are that adding types in one language speeds it up, while adding types in another language slows it down.

It depends what do the type annotations do. I'm not sure how perl6 does it, but for example in python type annotations are completely ignored at runtime, so don't have any impact. We'll see how much / for what does Ruby 3 actually want to use the type information. Sorbet on its own is unlikely to affect runtime either.
Going to keep praying for type/performance optimizations in Python so we can all get past the "python is slow" thing.

Async python is an absolute joy to develop with.

Care to elaborate which type of work you're doing and which libraries you're using?
Libraries:

aiohttp: web framework

aiopg: async postgres driver with SQLAlchemy support

asyncssh: async ssh library with SFTP capability

I generally work on CRUD microservices to automate some steps of a business workflow - activating/registering a resource with our vendors, generating and updating pricing, picking up new files off an FTP site and processing.

Type checking is runtime, although if the static optimizer can figure out that a certain call will never work at compile time, it will throw a compile time error.
On the other hand, if the static optimizer can figure out that a certain call will always work at compile time, it can remove the runtime check for that part.
Could you elaborate on how you got to the conclusion that adding types in Perl 6 slows things down? They shouldn't, unless you create types that actually run Perl 6 code during type checking. Which is usually not the case.
>Could you elaborate on how you got to the conclusion that adding types in Perl 6 slows things down?

I was playing around with adding types to everything in my program. Creating kind of a little Haskell style script. I was sad when it ran slower than it did without the types. Someone informed me that that is the expected outcome because (as you said) type checking is done at run time.

The thing is that even if you do not specify a type, you've implicitly specified the `Any` type. And type checking (which always happens at runtime, whether or not you've explicitly specified any types) will be done against that.

So I'm very curious as to what code exposed a slowdown after explicit types were added.

What is the rationale of adding types to a language that will still retain all performance penalties from the need to have dynamic typing code to interact with non-typed data?
The story didn't start as 'add types to Ruby'. It starts from someone having a codebase in the hundreds of thousands of lines of Ruby, dedicated to financial software, and the costs that they had by trying to keep said codebase from costing a lot of money: In those situations, you can go as far as toevaluate how much each bug deployed to production cost you.

Quite a few large companies have found themselves in this situation: Very large codebases in a programming language without types stop being fast to develop in. Then you get to either rewrite everything, with the well documented risks, or start doing all kinds of other things to make programming safer, like banning certain parts of the language, until eventually dedicating a team to improve the language is the most cost effective way to go.

In this case, I am also pretty certain that the interaction with data started having informal types a while ago too.

What I find really interesting here is that what starts as a library to help a single company handle the subset of Ruby they were using in the first place now aims to be good enough for general purpose Ruby outside of said company. It's one thing to have problems with an experimental, home-made thing, and just get support via slack, but adding this to the language has a far higher barrier. This is also probably the reason it's not OSS yet: The code that is enough for production use in Stripe's approach to Ruby might not be the greatest in a random codebase with different opinions on how many dynamic methods you want to have.

So it's not that a team decides to add types to Ruby instead of just picking a language that already has the types: It's solving a private problem and, a while later, realize that accidentally the solution is very close to being good enough for the language.

A lot of great insight in this comment.

The only difference is that Stipe has foresaw the problems and has been working on productivity for quite a while, with a dedicated group of people who help our engineers by building tools and abstractions. For example https://youtu.be/lKMOETQAdzs is done by the same org couple of years ago.

The rationale is that types are there to check correctness (types as proofs), not to improve performance.

Speed is not the first and foremost benefit of types. Type checking is (and other stuff that comes with that, like better completions, self-documenting code, etc).

Who's to say we couldn't use the types to make the runtime faster in the future?

One of the reasons why Sorbet does both runtime checking[1] more than just static checking is so that we can know that signatures are accurate, even when a typed method is called from untyped code.

If the signatures are accurate, a future project could take advantage of method's signatures to make decisions about how the code should actually be run. If the signatures lie, then any runtime optimization made using the types would only be overhead, because the runtime would have to abort the optimization and fall back to just running the interpreter.

[1]: https://sorbet.org/docs/runtime

Well I am skeptical about performance improvements due to type annotations as well. Other languages have similar different systems and didn't get faster.

Dart had gradual types but didn't enforce them at runtime because of performance. The PyPy devs don't believe that type annotations help them for performance (http://doc.pypy.org/en/latest/faq.html#would-type-annotation...). Also there is no JS engine that uses TypeScript annotations so far to improve performance.

Types are usually on the wrong boundary: e.g. Integer doesn't state whether that value fits into a register or is a Bignum.

Also: Aren't some type checks quite expensive? So more expensive than a simple map/class/shape check? E.g. passing an array from untyped code to a signature with something like `Array<Integer>`. Wouldn't a runtime that verifies signatures have to check all elements in the array to be actual Integers?

It's because PyPy relies on traced runtime statistics for optimizations via inlining. There's another approach where you translate your typed program into a lower-level target language and compile it into a native binary. See https://github.com/mypyc/mypyc and https://github.com/cython/cython/wiki/Python-Typing-Proposal
Dart has long changed.
True, that's why I wrote "Dart had".
I work on an alternative Ruby implementation, and it looks like I'll be able to take these type definitions and use them to add extra type constraints to my intermediate representation very easily - just insert a type constraining node around each expression that's annotated with a type. It'll remove extra guards and increase performance, so should definitely be an option.
Thank you chris for all your work on Truffle! Looking forward to using it.
I programmed C back in my high school years 14+ years ago. Today I am mostly using JS because it is the cash crop of the industry, and it made me quite some money when I was away from electronics business (my main occupation) for a year after getting troubles with Canadian visas.

To me, it feels that there is a very thick wall in between high level languages and something with raw data access like C, C++, and D. You either completely throw out every convenience feature, or go all in on them.

In C, a lot of data access turns into single digit number of load/store or register access instructions. It is easy to see that it is close to impossible to add fancy data access functionality on top of that without going from single cycles to kilocycles.

I was once told "when your try improving a programming language performance, it eventually turns into C"

P.S. on JIT - it is not given that a JIT language be automatically faster than a well written interpreter on a modern CPU. One of early tricks of making fast interpreters was to keep as much of interpreter in cache and data in registers as possible to benefit from more or less linear execution flow of unoptimised code in comparison to unpredictable flow of JIT made executable code. Today, with 16MB caches, I think the benefit of that will be even bigger.

> I was once told "when your try improving a programming language performance, it eventually turns into C"

Which is kind of ironic given how bad C compilers generated code during the mid-80s, versus other mainframe languages.

>To me, it feels that there is a very thick wall in between high level languages and something with raw data access like C, C++, and D.

That is why we need something that offer 80% the Speed of C, 80% of Simplicity / expressiveness of Javascript / Ruby, and 80% of ease of long term maintenance of a functional PL like Ocaml.

I actually think Java will one day evolve very close to that goal.

Types are the means of "improving language performance" without turning into C. It's all about encoding invariants for the optimizer.

I doubt a competent JIT is ever slower than a competent interpreter, but it may not be that much faster or worth the workload.

It depends on the size of the primitives. An array language could be close to 1:1, while for a cpu-level instructions you will struggle to reach 1/6 of JITted perf.

You forget this is ruby, which will not have a competent JIT. They have such a bad JIT with enormous overhead that only very special cases will be faster, most cases are slower and will wait for locks or be racy.
With types you get more compile-time checks - safer code, better documentation, and the possibility to improve runtime performance and ffi. With a guaranteed int you don't need to check for bignum overflows, and you can avoid all runtime type checks. A typed ffi struct can be used as is for the ffi, raw data. strings are guaranteed to be 0 delimited.

In certain basic blocks typed ints or floats can be unboxed, if they will not escape. This is what php7 made 2x faster. the stack will get much leaner. simple arithmetic ops can be inlined, using native variants. ops with typed vars cannot be overridden.

Optimizing is possible even in those cases. A JIT usually runs a function hundreds of times to collect type data before attempting to optimize the function. Types can be used to pre-fill that type data. The JIT can then optimize immediately, but still bailout if the wrong types show up in the future. I wouldn't think such an approach would yield huge benefits overall (the most used code will be optimizing pretty quickly anyway), but on server apps, it could speed up edge-case behaviors a bit.

Another feature of even optional types is creating uniformity to allow JIT optimization. A great real-world example of this is Typescript or ReasonML. It's converted to JS, but still winds up faster on average. The JS JITs have multiple tiers of optimization. Changing data types and function signatures are the biggest performance killers. If you can ensure a list is always strings or numbers, then the optimizer can reach the top tier of optimization. When lots of people work together on untyped languages, there tend to be small changes in the signatures and structures that drop you out of that top optimization level. Even partial types are useful for preventing this.

Related to that is the potential for runtime type warnings. Even though the types aren't used by the JIT, it should be possible to give a warning message if the received types don't match up. That could be a huge assistance in finding where a bug is located.

Readability most likely. Type checking tends to also reduce basic bugs from mismatched inputs as well.
so that you can gradually add types?

now that ruby has an actual jit compiler, it could benefit from typing to optimize code further. And a gradual migration process will help people speed up parts of their code. Unless they mess it up like python where abstractions are costly.