Hacker News new | ask | show | jobs
by baybal2 2617 days ago
What is the rationale of adding types to a language that will still retain all performance penalties from the need to have dynamic typing code to interact with non-typed data?
7 comments

The story didn't start as 'add types to Ruby'. It starts from someone having a codebase in the hundreds of thousands of lines of Ruby, dedicated to financial software, and the costs that they had by trying to keep said codebase from costing a lot of money: In those situations, you can go as far as toevaluate how much each bug deployed to production cost you.

Quite a few large companies have found themselves in this situation: Very large codebases in a programming language without types stop being fast to develop in. Then you get to either rewrite everything, with the well documented risks, or start doing all kinds of other things to make programming safer, like banning certain parts of the language, until eventually dedicating a team to improve the language is the most cost effective way to go.

In this case, I am also pretty certain that the interaction with data started having informal types a while ago too.

What I find really interesting here is that what starts as a library to help a single company handle the subset of Ruby they were using in the first place now aims to be good enough for general purpose Ruby outside of said company. It's one thing to have problems with an experimental, home-made thing, and just get support via slack, but adding this to the language has a far higher barrier. This is also probably the reason it's not OSS yet: The code that is enough for production use in Stripe's approach to Ruby might not be the greatest in a random codebase with different opinions on how many dynamic methods you want to have.

So it's not that a team decides to add types to Ruby instead of just picking a language that already has the types: It's solving a private problem and, a while later, realize that accidentally the solution is very close to being good enough for the language.

A lot of great insight in this comment.

The only difference is that Stipe has foresaw the problems and has been working on productivity for quite a while, with a dedicated group of people who help our engineers by building tools and abstractions. For example https://youtu.be/lKMOETQAdzs is done by the same org couple of years ago.

The rationale is that types are there to check correctness (types as proofs), not to improve performance.

Speed is not the first and foremost benefit of types. Type checking is (and other stuff that comes with that, like better completions, self-documenting code, etc).

Who's to say we couldn't use the types to make the runtime faster in the future?

One of the reasons why Sorbet does both runtime checking[1] more than just static checking is so that we can know that signatures are accurate, even when a typed method is called from untyped code.

If the signatures are accurate, a future project could take advantage of method's signatures to make decisions about how the code should actually be run. If the signatures lie, then any runtime optimization made using the types would only be overhead, because the runtime would have to abort the optimization and fall back to just running the interpreter.

[1]: https://sorbet.org/docs/runtime

Well I am skeptical about performance improvements due to type annotations as well. Other languages have similar different systems and didn't get faster.

Dart had gradual types but didn't enforce them at runtime because of performance. The PyPy devs don't believe that type annotations help them for performance (http://doc.pypy.org/en/latest/faq.html#would-type-annotation...). Also there is no JS engine that uses TypeScript annotations so far to improve performance.

Types are usually on the wrong boundary: e.g. Integer doesn't state whether that value fits into a register or is a Bignum.

Also: Aren't some type checks quite expensive? So more expensive than a simple map/class/shape check? E.g. passing an array from untyped code to a signature with something like `Array<Integer>`. Wouldn't a runtime that verifies signatures have to check all elements in the array to be actual Integers?

It's because PyPy relies on traced runtime statistics for optimizations via inlining. There's another approach where you translate your typed program into a lower-level target language and compile it into a native binary. See https://github.com/mypyc/mypyc and https://github.com/cython/cython/wiki/Python-Typing-Proposal
Dart has long changed.
True, that's why I wrote "Dart had".
I work on an alternative Ruby implementation, and it looks like I'll be able to take these type definitions and use them to add extra type constraints to my intermediate representation very easily - just insert a type constraining node around each expression that's annotated with a type. It'll remove extra guards and increase performance, so should definitely be an option.
Thank you chris for all your work on Truffle! Looking forward to using it.
I programmed C back in my high school years 14+ years ago. Today I am mostly using JS because it is the cash crop of the industry, and it made me quite some money when I was away from electronics business (my main occupation) for a year after getting troubles with Canadian visas.

To me, it feels that there is a very thick wall in between high level languages and something with raw data access like C, C++, and D. You either completely throw out every convenience feature, or go all in on them.

In C, a lot of data access turns into single digit number of load/store or register access instructions. It is easy to see that it is close to impossible to add fancy data access functionality on top of that without going from single cycles to kilocycles.

I was once told "when your try improving a programming language performance, it eventually turns into C"

P.S. on JIT - it is not given that a JIT language be automatically faster than a well written interpreter on a modern CPU. One of early tricks of making fast interpreters was to keep as much of interpreter in cache and data in registers as possible to benefit from more or less linear execution flow of unoptimised code in comparison to unpredictable flow of JIT made executable code. Today, with 16MB caches, I think the benefit of that will be even bigger.

> I was once told "when your try improving a programming language performance, it eventually turns into C"

Which is kind of ironic given how bad C compilers generated code during the mid-80s, versus other mainframe languages.

>To me, it feels that there is a very thick wall in between high level languages and something with raw data access like C, C++, and D.

That is why we need something that offer 80% the Speed of C, 80% of Simplicity / expressiveness of Javascript / Ruby, and 80% of ease of long term maintenance of a functional PL like Ocaml.

I actually think Java will one day evolve very close to that goal.

Types are the means of "improving language performance" without turning into C. It's all about encoding invariants for the optimizer.

I doubt a competent JIT is ever slower than a competent interpreter, but it may not be that much faster or worth the workload.

It depends on the size of the primitives. An array language could be close to 1:1, while for a cpu-level instructions you will struggle to reach 1/6 of JITted perf.

You forget this is ruby, which will not have a competent JIT. They have such a bad JIT with enormous overhead that only very special cases will be faster, most cases are slower and will wait for locks or be racy.
With types you get more compile-time checks - safer code, better documentation, and the possibility to improve runtime performance and ffi. With a guaranteed int you don't need to check for bignum overflows, and you can avoid all runtime type checks. A typed ffi struct can be used as is for the ffi, raw data. strings are guaranteed to be 0 delimited.

In certain basic blocks typed ints or floats can be unboxed, if they will not escape. This is what php7 made 2x faster. the stack will get much leaner. simple arithmetic ops can be inlined, using native variants. ops with typed vars cannot be overridden.

Optimizing is possible even in those cases. A JIT usually runs a function hundreds of times to collect type data before attempting to optimize the function. Types can be used to pre-fill that type data. The JIT can then optimize immediately, but still bailout if the wrong types show up in the future. I wouldn't think such an approach would yield huge benefits overall (the most used code will be optimizing pretty quickly anyway), but on server apps, it could speed up edge-case behaviors a bit.

Another feature of even optional types is creating uniformity to allow JIT optimization. A great real-world example of this is Typescript or ReasonML. It's converted to JS, but still winds up faster on average. The JS JITs have multiple tiers of optimization. Changing data types and function signatures are the biggest performance killers. If you can ensure a list is always strings or numbers, then the optimizer can reach the top tier of optimization. When lots of people work together on untyped languages, there tend to be small changes in the signatures and structures that drop you out of that top optimization level. Even partial types are useful for preventing this.

Related to that is the potential for runtime type warnings. Even though the types aren't used by the JIT, it should be possible to give a warning message if the received types don't match up. That could be a huge assistance in finding where a bug is located.

Readability most likely. Type checking tends to also reduce basic bugs from mismatched inputs as well.
so that you can gradually add types?

now that ruby has an actual jit compiler, it could benefit from typing to optimize code further. And a gradual migration process will help people speed up parts of their code. Unless they mess it up like python where abstractions are costly.