Hacker News new | ask | show | jobs
by chrisloy 4106 days ago
Compilation speeds are largely dependent on the type of code you're writing. At London's Scala Exchange last year, Odersky claimed he can get 10k lines/second (10x slower than javac, but still not bad) out of scalac by writing simple imperative code, but libraries that make heavy use of the type system and implicit scope can be as slow as 1 line/minute.

I consider it fair to expect the compiler to be slower when it's doing more heavy lifting, but it would definitely be nice to see the exponential blow-up on the tail curbed in future.

edit: To clarify, the numbers above were what Odersky claimed; I haven't verified them. The 1 line/minute was claimed of trying to build Shapeless - I've never seen anything approaching that in the wild.

2 comments

> 1 line/minute

I get that that's an upper limit, but that's still pretty shocking. Is there an overarching reason for this type of performance? Examples of this style of code?

Code with many implicit parameters and higher-kinded types is hard to compile, as the compiler needs to solve essentially a Prolog-ish unification problem. Scalaz is a popular library written largely in this style (e.g. [1]). (FWIW I've never seen 1 line/minute.)

[1] https://github.com/scalaz/scalaz/blob/series/7.2.x/core/src/...

The one line per minute was observed when trying to compile a query over an HList backed record with more than 300 columns. The problem was a polynomial increase of the size of the implicit search tree. This is due to the way implicits are used in shapeless HList library. Slick has a different approach which is much faster, but slightly less general.

The gist is that once you have [edit: recursive] typeclasses or implicits your search can become arbitrarily large and slow. There's nothing that can be done about it by the compiler except arbitrarily pruning the search tree. Or otherwise put, you should factor in implicit search complexity when designing your libraries.

OCaml and Haskell offer very similar functionalities and their compilation times are lightning fast.

The main reason why Scala is slow is because the compiler is poorly written. Take a look at the articles from Paul Phillips, one of the Typesafe cofounders who left the company in frustration, he's probably the person who knows scalac the best and he says the code base is basically hopeless.

Note that 1/3rd of Martin's presentation is about Dotty, and that's basically where he spends most of his time these days, he hardly contributes to Scala any more [1].

[1] https://twitter.com/odersky/status/574665768484339713

I can't speak about OCaml, but Haskell doesn't have subtypes. That's where a lot of the complexity of Scala comes from as well as being a source of annoying bugs:

    def f(x: Int) = if (x % 2 == 0) { x } else { None }
(This compiles, but the type of `f` is not `f: Int => Option[Int]`.)

[edit: the bug is not in the scala compiler, the bug is in my code - I almost certainly did not want `f` to have type `Int => Any`. My phrasing, now edited, was misleading. ]

That is not a bug. That is subtype inference working correctly. If you (mistakenly) expected the return type of f to be Option[Int], say so, and you'll get a compile time error telling you about your mistake:

scala> def f(x: Int): Option[Int] = if (x % 2 == 0) { x } else { None } <console>:7: error: type mismatch; found : Int required: Option[Int] def f(x: Int): Option[Int] = if (x % 2 == 0) { x } else { None }

    scala> def f(x: Int) = if (x % 2 == 0) { x } else { None }
    f: (x: Int)Any
Which is expected. The only shared type between both branches of the `if` expression is `Any`.
This is correct given Scala's limitation: the only common supertype between Int and Option is Any.

A language that supports union types (e.g. Ceylon) will type this expression as Int|Option[Int], which is as specific as you can get.

I believe Ceylon will actually further refine Int | Option[Int] to simply Option[Int].

Option[Int] is simply an alias to the type Int | Null, so Int | Option[Int] would become Int | Int | Null after replacement, which simplifies to Int | Null since Int | Int is equivalent to just Int.

> OCaml and Haskell offer very similar functionalities and their compilation times are lightning fast.

It took only one sentence to show that you never used OCaml or Haskell, so I'm not sure how serious the rest of your claims should be taken.

OCaml compiles pretty fast, especially if you target bytecode. Haskell on the other hand... is still faster than Scala, but not "lightning fast".
> OCaml compiles pretty fast, especially if you target bytecode.

Yes, but OCaml doesn't "offer very similar functionalities".

Haskell is quite a bit slower if you consider that you often have to compile not only your own code, but also the sources of your dependencies in the beginning. Additionally, you have to fight with Cabal, which tends to be very unpleasant due to the lack of any reasonable support for exotic things like "versions". Then you might discover hacks like "sandboxing". Hours not spent coding.

Every turing-complete typesystem can take an unbounded time to compile. Scala is one of those languages with turing-complete typesystems.
I'm not sure that's a sufficient explanation. Idris and Haskell (with extensions) also have Turing complete type systems. I've never seen either even approach performance like 1 line/minute in real code.
It's quite easy to get extremely slow type inference in expressive typing systems. This code

let f0 = fun x -> ( x, x ) in let f1 = fun y -> f0( f0 y ) in let f2 = fun y -> f1( f1 y ) in let f3 = fun y -> f2( f2 y ) in let f4 = fun y -> f3( f3 y ) in let f5 = fun y -> f4( f4 y ) in f5 ( fun z -> z );;

took at least 3 months (!) to type-infer when I ran it on my 2008 MacBook using the standard Ocaml compiler. I have not tried since. Type inference algorithms have horrendous worst-case complexity. Fortunately that worst-case complexity usually doesn't matter in practise.

I never seen 1 line/minute in Scala either, but that's not the point.

Theoretically, there is nothing stopping you from writing code which will never finish compiling if you have a Turing-complete typesystem.

I've written a lot of Scala and I don't think I've seen anything as slow as even 1 line/second. Odersky was talking about Shapeless, which is an edge case in that it is a home for all the type-heavy calculations that scalac can compile but isn't currently optimised towards.

edit: reworded for clarity

Computing a type level factorial can take more than 1 minute :)
Here is a proof that the Scala type system is actually Turing complete:

https://michid.wordpress.com/2010/01/29/scala-type-level-enc...

I can't comment on the details, I'm more of of Clojure guy.

I think you got that wrong by a factor of 10. I get between 500 and 1000 lines / sec on my laptop. It depends on the code style. The more straightforward the code the faster the compile. Still with incremental compilation it means I wait rarely more than a couple of seconds.
Thanks Martin, unfortunately it's past the point where I can edit my original comment or I'd go back and fix it. As I recall you said it off-mike in Bill Venners' talk, otherwise I would have fact-checked it!

Anyway, details aside I thought it was a great illustration of the impact on compiler performance that code complexity can have.