Hacker News new | ask | show | jobs
by mustpax 4365 days ago
Exactly! Go makes the cost of generic code explicitly visible.

Generics encourage over-generalizing behavior that runs counter to writing highly performant code. If you care about speed, you don't spend time making your code generic. You optimize closely to your use case.

1 comments

>Go makes the cost of generic code explicitly visible. Generics encourage over-generalizing behavior that runs counter to writing highly performant code.

I think you may be missing some info regarding generics in Rust and Haskell. As I mentioned in the article, there is zero runtime overhead for generic programming in Rust and Haskell. Zip. Zilch. Nada. That's why their constraint-based static generics system is awesome.

Haskell's generic programming constructs do not have zero runtime overhead (at least on GHC, the dominant compiler). Typeclass-overloaded functions take an extra argument at runtime, in which the class "methods" are looked up. In particular, the typeclass-based definition of add3 turns into this Core code, which has an extra "$dNum_az0" arg to carry Num methods:

  ghci>let add3 a b c = a + b + c
  
  ==================== Simplified expression ====================
  GHC.Base.returnIO
    (GHC.Types.:
       ((\ (@ a_ayZ)
           ($dNum_az0 :: GHC.Num.Num a_ayZ)
           (a_ayG :: a_ayZ)
           (b_ayH :: a_ayZ)
           (c_ayI :: a_ayZ) ->
           GHC.Num.+ $dNum_az0 (GHC.Num.+ $dNum_az0 a_ayG b_ayH) c_ayI)
        `cast` ...)
       (GHC.Types.[]))
Compare this to the Int-specialized add3, which does not have to be passed the extra $dNum_az0 argument:

  ghci>let add3 a b c = a + b + c; add3 :: Int -> Int -> Int -> Int
  
  ==================== Simplified expression ====================
  GHC.Base.returnIO
    (GHC.Types.:
       ((\ (a_azj :: GHC.Types.Int)
           (b_azk :: GHC.Types.Int)
           (c_azl :: GHC.Types.Int) ->
           GHC.Num.+
             GHC.Num.$fNumInt (GHC.Num.+ GHC.Num.$fNumInt a_azj b_azk) c_azl)
        `cast` ...)
       (GHC.Types.[]))
Now, am I saying the the typeclass method isn't fast, or that GHC can't then optimize that Num dictionary away via specialization or inlining? No, I am not saying that. But it certainly doesn't always do that, resulting in a performance hit at runtime. More info @ http://www.haskell.org/haskellwiki/Performance/Overloading
If you have an explicit type signature and compile with -O, GHC will auto-specialize, afaik.
Please correct me if I am wrong about boxing in Rust, but doesn't boxing produce some overhead, by creating extra heap allocations (and therefore possible memory fragmentation and almost certainly poor cache locality), as well as pointer dereferences?

I really hate it in Java when I need an array of bytes but don't know the size in advance, or the size changes, and I have to incur all the cost of boxing those bytes up. On 64-bit systems (most of them), pointers are 64 bits which is at least twice as large as the most common things you put in a list (int, float, byte).

Boxing is heap allocation. Rust lets you explicitly choose if something is on the heap or on the stack. Idiom prefers stack allocation.

You don't need to box something to make it generic.

Lack of generics is part of the reason why Go is easier to learn and the Go compiler is faster than, say, Rust.

Sure, generics don't have runtime overhead in Rust and Haskell but they have other costs. You always pay for abstractions some way.

> Lack of generics is part of the reason why…the Go compiler is faster than, say, Rust.

The speed of the Rust compiler has little to do with generics and everything to do with LLVM and its optimizations and code generation. (Run with -Z time-passes if you don't believe me.)

I don't know if it has to do with generics, but you can't blame it on LLVM.

    /usr/src/rust/src/libsyntax % time /usr/src/rust/x/x86_64-apple-darwin/stage2/bin/rustc lib.rs -o /tmp/x.dylib --crate-type dylib -C prefer-dynamic -Z time-passes
    [most output removed...]
    time: 6.186 s	type checking
    time: 5.084 s	translation
    time: 7.221 s	LLVM passes
    /usr/src/rust/x/x86_64-apple-darwin/stage2/bin/rustc lib.rs -o /tmp/x.dylib    22.44s user 0.76s system 99% cpu 23.301 total
First of all: no more than 1/3 to 1/2 of the time is spent in LLVM in this particular example. Okay, that's cheating because there is no -O, as I'm guessing your statement supposed, but 22 seconds is already a huge amount of time to compile 30,000 lines of code, so unoptimized builds are relevant to claims that rustc is slow. The other points apply regardless of optimization setting.

Second: rustc will take this long every time to recompile libsyntax every time anything in it changes. If this were written in C, most changes would only require recompiling one source file, even though, again, header files are not treated well (something that Rust does not need to replicate). In practice this means that typical latency between changing something and seeing the output in a C/C++ program is an order of magnitude faster.

Third: The same separation that makes incremental compilation work in C/C++ allows parallelism (make -j) in full builds. rustc uses only one core per crate. Again, headers reduce the gains but Rust doesn't have that problem.

Fourth: If we compare to C rather than C++, we're off by sometimes an order of magnitude regardless of parallelism. Here is some random C program (Apple as), a total of 26929 lines, compiling in 0.90 seconds:

    clang -o foo -I ../include -I ../include/gnu -I. -DNeXT_MOD -DI386 -Di486      0.73s user 0.16s system 98% cpu 0.904 total
(With optimizations it is 2.426 seconds.)

Or libpng, 32433 lines in 0.83 seconds:

    clang -o png *.c -I. -lz  0.70s user 0.12s system 98% cpu 0.831 total
With the Linux kernel, compiled without -j and at -O1 using GCC (both of which make it slower), it's not an order of magnitude off, but it's still significantly faster than Rust: make ARCH=arm zImage 540.25s user 73.71s system 96% cpu 10:33.81 total

It compiled 1572713 lines of code; normalizing to 30,000 gives us about 12 seconds.

On the Rust side, linking libstd, about the same size, takes 6 seconds, but librustc, three times the size, took 216 seconds (10 times as long as libsyntax) to get to linking. So it varies, but I guess libsyntax is representative.

I do not know enough to have a definitive opinion of why this difference exists, but judging from the difference between C and C++, I wouldn't be surprised if it were related to Rust's complexity, which includes generics.

> First of all: no more than 1/3 to 1/2 of the time is spent in LLVM in this particular example.

"translation" is mostly LLVM (in particular, allocation of LLVM data structures).

> I do not know enough to have a definitive opinion of why this difference exists, but judging from the difference between C and C++, I wouldn't be surprised if it were related to Rust's complexity, which includes generics.

If you look at the amount of code in a typical large Rust program that is related to generic instantiations, then it's pretty miniscule (less than 10%).

Typechecking is known to be slow because the way we do method lookups is not well optimized. I don't believe that this is a fundamental issue; it's more that nobody has gotten around to improving it yet.

I don't think the rust compiler has been particularly extensively optimised at this point. There's no blindingly obvious hotspots (last I profiled the biggest single time sink was hashmap lookups in type checking), but there's not been any real effort in reducing compile time yet, as far as I know.
Based on this logic I assume you code in nothing but machine code? After all even assembler is an abstraction and therefore must have a cost. Heaven forbid you should do something as extravagant as use C for something, that level of abstraction (all those long jumps and stack manipulations, oh my) must positively destroy performance. Do you also happen to program using the gentle flapping of butterfly wings to disturb air currents and redirect cosmic rays?
The rust compiler is actually really quick. The LLVM optimization passes and code gen is where most time is spent (this is after monomorphisation of generics). The Go compiler does very few optimizations compared to LLVM, which leads to faster build times but slower code.
Last I checked compilation time wasn't really a thing a ton of folks worry about (myself included). Faster machines and reasonably better compilers have mostly solved this problem. I'd take zero-overhead generics over a slightly faster compiler any day of the week.
(incremental) compilation needs to be fast enough. A too-slow build cycle can really kill productivity, both in trying out new ideas for code and in debugging. The build times of the rust compiler are pretty close to the limit of what I'll tolerate, and I would really love to have compilation speeds similar to go.
Speak for yourself - slow compilation drives me crazy, as it increases the latency between making a change and seeing the result. Not that I believe that generics require slow compilation.
Fast build time is explicitly something Go was built for, to fix 40 minute compile times.
The dmd compiler for D compiles faster than gc and yet supports compile-time templates.