Hacker News new | ask | show | jobs
by Mr_P 4369 days ago
These two statements made me cringe:

> But, for systems programming, abstractions suck. They always, always have a cost.

> Generics? Here's another if statement, put it inside your for loop.

If you care about speed (and many systems programmers do), this is exactly the opposite of what you want to do. Unlike your proposal of putting potentially-costly if-statements inside of for loops, generics/templates in c++ provide zero-cost abstraction (in terms of execution time. If you think dealing with the error messages presents too high cost in terms of developer-time, switch to clang).

2 comments

If you truly care about speed you'll have different optimizations for int32 and int64.

Depending on who you are working with, the lack of generics is a blessing. Some developers can't restrain themselves and create over-complex abstractions that are used only once.

Generics generally simplify code, you know. Plus, generic code tend to make guarantees non-generic code cannot.

Picture this function:

  foo :: (a -> b) -> (b -> c) -> (a -> c)
What does it do?

As you can see this function takes 2 arguments, which appears to be functions. It also returns a function. The argument and return type of these functions are unknown, so you can't manipulate them. You can just pass them around directly. This puts really tight constraints on your code. So, assuming nothing fancy happens, there is only one correct body of code for this type signature:

  foo f g = \x -> g (f x)
In other words, function composition.

---

Generic functions have more guarantees than non-generic functions. Therefore, you are more likely to know what a generic function is actually doing.

Parametric polymorphism does not simplify code, it obfuscates it as now you have to see _what_ is calling it. A better form IMO is restricting the types allowed as it: 1. makes understanding much easier, 2. Doesn't create bloat in the form of n copies for visible symbols.
> Parametric polymorphism does not simplify code, it obfuscates it as now you have to see _what_ is calling it.

This is backward. A function, polymorphic or otherwise, does not influence code that does not call it. Modulo far-reaching side effects of course.

It's when you look at the call site that you have to figure out what this strange `fold` function could possibly be about.

I'll grant that polymorphic functions are often more abstract than monomorphic ones. But they are simpler. They are also better at separating concerns. Take `map` and `filter` for instance. They capture common iteration patterns, so you don't have to entangle that pattern with the actual logic of your loop. Without parametric polymorphism, you could not write them (more precisely, you would have to repeat yourself over and over).

> A better form IMO is restricting the types allowed as it: 1. makes understanding much easier

That's just false. If you restrict the types a function can operate on, you allow the function to do more things to its data. The more you know about its input, the less you know about the function. With parametric polymorphism, you are actually hiding information from the function, preventing it from making whole classes of mistake. Free tests!

Parametric polymorphism makes functions that use it simpler (as in, less moving parts). How could that possibly be harder to understand? Please give a concrete example, I don't understand where you're coming from right now.

> 2. Doesn't create bloat in the form of n copies for visible symbols.

That's an implementation detail, and mostly false anyway. Not every language is C++. Most languages that make use of parametric polymorphism don't duplicate code.

> A function, polymorphic or otherwise, does not influence code that does not call it.

Obfuscate does not mean influence, it obfuscates it you now have to follow the rabbit hole to where the type information is (language detail but it is how almost all parametric polymorphism is) which makes reasoning about it a pain.

> That's an implementation detail, and mostly false anyway.

No it's not 'unbound' parametric polymorphism in a compiled language has to produce symbols for any visible (not internal) function as there would be no way to know what might get called.

> Most languages that make use of parametric polymorphism don't duplicate code.

Yes any compiled language does how on earth would you call a symbol that took a bool vs a size_t (I recommend you look at how ELF works). On somewhat related note a sufficiently smart compiler could 'deduplicate' common parts of the slow path within a generic function and create calls but that's about all it could do for deduplication without performance hits.

You need practice with an ML derivative. Fetch a Haskell tutorial, then go write a little project of your choice that requires a few dozens lines.

> Obfuscate does not mean influence, it obfuscates it you now have to—

Wow, slow down. And give me an example, or I won't know what you mean.

> No it's not 'unbound' parametric polymorphism in a compiled language has to produce symbols for any visible (not internal) function as there would be no way to know what might get called.

Your lack of punctuation is hard to parse.

Anyway, it doesn't work like that. C++ for instance doesn't instantiate the polymorphic function for every possible type. Actually, it tries to compile monomorphic code first and only specialized polymorphic stuff as needed. This is why you need to actually use template code before the compiler can check it properly. (Notice how some error messages only surface when you use template code?

Let me give you an example (untested code):

  template<typename T>
  T sum(vector<T> vec)
  {
    T sum = vec[0];
    for (int i = 0; i < vec.size(); i++)
      sum = sum + vec[i];
    return sum;
  }
Why the generic code? Because I'm likely to perform sums on integers, floating point numbers, complex numbers and other fancy stuff. I fail to see how this approach obfuscates anything, since it let me write less code.

Now my C++ compiler will not compile this function for integers, floats, and every user-defined class I have in my program. If my program only uses it on vectors of integers, it will only instantiate the integer version, even if I have floats in my program.

> Yes any compiled language does how on earth would you call a symbol that took a bool vs a size_t (I recommend you look at how ELF works).

Muhaha, you foolish mortal. Let me tell you how this works in OCaml.

Under the hood, OCaml data are one of two things: an integer, or a pointer to heap data. The compiler can distinguish them thanks to the least significant bit: 1 when it is an integer, 0 when it is a pointer.

This is possible because in most machines pointers are generally aligned to word boundaries, and words are almost always 16 bits or more. Integers on the other hand have one less bit. On a 32 bit machine for instance, OCaml integers fit in the 31 most significant bits. More precisely, when the program sees a 32 bit word whose value is 43, it knows that it's an integer, and that the underlying integer is 21 (meaning, 43>>1).

You will note this suspiciously looks like a dynamic language implementation of runtime tags. It is a bit cleverer than that. First, the code is statically checked, so it never performs any runtime check with respect to this tag. The garbage collector on the other hand knows little about the program, and needs a way to distinguish raw data from heap pointers to do its job.

Now polymorphic code. In OCaml, a polymorphic function knows nothing about its polymorphic arguments. This is important, because it mean the code inside won't ever inspect nor modify the values at runtime. Take this example:

  let app f x = f x         (* app: (a -> b) -> a -> b *)
`app` is function application reified into a function. Yes it's silly in most cases. Bear with me. Look at its type. It accepts two arguments (of type a->b and a respectively), and returns a value (of type b). As you may have noticed, we have no frigging clue what those `a` and `b` mean. That's what it means to be polymorphic.

Now let's call the function on actual arguments.

  app (fun x -> x + 1) 42   (* the result is 43 *)
Okay, so the first argument is a function from integers to integers, and the second argument is… 42 (an integer). And poof magic, it works.

Under the hood it's not complicated. `app` knows that its first argument is a function, and it knows that the type of its second argument is compatible with that function. Since the first argument is a function, at runtime, it must be represented by a pointer to the heap. More specifically, it will point to a closure on the heap. We don't know much about this closure:

  +---+-------+
  | f | Data… |
  +---+-------+
We don't know anything about that `Data` stuff, but we do know that `f` is a pointer to code that will accept at least one argument.

Then there is 42. In the CPU it will be represented as 85 (42<<1 + 1). But it doesn't matter. `app` doesn't know if it's an integer or a pointer to a heap: from where it stands that word is just an opaque blob of data. The only safe thing it can do with it is copy it around. (And the static type checker ensures it does no more than that.)

So… `app` has 3 things: a pointer to a closure, a pointer to a piece of code, and an opaque blob of data (which happens to be an integer, but it doesn't know that). What it must do is clear:

  - Push the opaque blob of data to the heap.
  - Push the pointer to the closure to the heap.
  - Call f
And voilà, we have polymorphic code at the assembly language level. By carefully not inspecting the data, it works on every kind of data. No need for de-duplication.

Still, we don't have our result. We just called `f`, which has 2 arguments to contend with: its closure, and its "real" argument: 42. Now as you can see in the source code, `f` is not polymorphic at all. It works on integers. So it knows about its argument. Actually, it knows two things: the `Data` part of the closure is empty, and its argument is an integer. So it just adds 1 to 42 (possibly using some clever CPU arithmetic involving the LEA instruction), pops 2 elements off the stack, and pushes its result (43, which we represent as 87).

Now we're back to our polymorphic `f` which has this 87 blob of opaque data at the top of the stack. Well, it just returns it to its own caller, who hopefully will know how to handle that data.

---

As I have just illustrated, there is no need for duplication in the first place. Polymorphic code in OCaml generates polymorphic code at the assembly language level. And this was a naive compilation scheme. "Dumb" turns out to be sufficiently smart. De-duplication out of the box if you will.

And about that "slow path" (implying a fast path somewhere) that is typical of JIT compilation, we don't have that shit in statically typed functional languages. The "slow path" is already fast, since it doesn't perform any test at runtime!

> If you truly care about speed you'll have different optimizations for int32 and int64.

This is exactly why c++ allows template specialization, and if you don't care for hand-optimizing, you can get both implementations almost for free.

> Depending on who you are working with, the lack of generics is a blessing. Some developers can't restrain themselves and create over-complex abstractions that are used only once.

I can't comment on the competency of your coworkers, but I certainly see how Go could be useful in situations without the kind of performance-constraints which demand a language like c++.

> If you truly care about speed you'll have different optimizations for int32 and int64.

I'm pretty sure that's what C++ templates specialisation is for...

Which in c++ you can easily do, as you probably know. All you need to do is have an if statement that compares the type parameter of your template to int32 and int64. The good thing is that this kind of optimizations is transparent to the user if the code works. Even Haskell has something similar, with the SPECIALIZE pragma. Although in this case you have to trust the compiler to come up with the optimizations.
Exactly! Go makes the cost of generic code explicitly visible.

Generics encourage over-generalizing behavior that runs counter to writing highly performant code. If you care about speed, you don't spend time making your code generic. You optimize closely to your use case.

>Go makes the cost of generic code explicitly visible. Generics encourage over-generalizing behavior that runs counter to writing highly performant code.

I think you may be missing some info regarding generics in Rust and Haskell. As I mentioned in the article, there is zero runtime overhead for generic programming in Rust and Haskell. Zip. Zilch. Nada. That's why their constraint-based static generics system is awesome.

Haskell's generic programming constructs do not have zero runtime overhead (at least on GHC, the dominant compiler). Typeclass-overloaded functions take an extra argument at runtime, in which the class "methods" are looked up. In particular, the typeclass-based definition of add3 turns into this Core code, which has an extra "$dNum_az0" arg to carry Num methods:

  ghci>let add3 a b c = a + b + c
  
  ==================== Simplified expression ====================
  GHC.Base.returnIO
    (GHC.Types.:
       ((\ (@ a_ayZ)
           ($dNum_az0 :: GHC.Num.Num a_ayZ)
           (a_ayG :: a_ayZ)
           (b_ayH :: a_ayZ)
           (c_ayI :: a_ayZ) ->
           GHC.Num.+ $dNum_az0 (GHC.Num.+ $dNum_az0 a_ayG b_ayH) c_ayI)
        `cast` ...)
       (GHC.Types.[]))
Compare this to the Int-specialized add3, which does not have to be passed the extra $dNum_az0 argument:

  ghci>let add3 a b c = a + b + c; add3 :: Int -> Int -> Int -> Int
  
  ==================== Simplified expression ====================
  GHC.Base.returnIO
    (GHC.Types.:
       ((\ (a_azj :: GHC.Types.Int)
           (b_azk :: GHC.Types.Int)
           (c_azl :: GHC.Types.Int) ->
           GHC.Num.+
             GHC.Num.$fNumInt (GHC.Num.+ GHC.Num.$fNumInt a_azj b_azk) c_azl)
        `cast` ...)
       (GHC.Types.[]))
Now, am I saying the the typeclass method isn't fast, or that GHC can't then optimize that Num dictionary away via specialization or inlining? No, I am not saying that. But it certainly doesn't always do that, resulting in a performance hit at runtime. More info @ http://www.haskell.org/haskellwiki/Performance/Overloading
If you have an explicit type signature and compile with -O, GHC will auto-specialize, afaik.
Please correct me if I am wrong about boxing in Rust, but doesn't boxing produce some overhead, by creating extra heap allocations (and therefore possible memory fragmentation and almost certainly poor cache locality), as well as pointer dereferences?

I really hate it in Java when I need an array of bytes but don't know the size in advance, or the size changes, and I have to incur all the cost of boxing those bytes up. On 64-bit systems (most of them), pointers are 64 bits which is at least twice as large as the most common things you put in a list (int, float, byte).

Boxing is heap allocation. Rust lets you explicitly choose if something is on the heap or on the stack. Idiom prefers stack allocation.

You don't need to box something to make it generic.

Lack of generics is part of the reason why Go is easier to learn and the Go compiler is faster than, say, Rust.

Sure, generics don't have runtime overhead in Rust and Haskell but they have other costs. You always pay for abstractions some way.

> Lack of generics is part of the reason why…the Go compiler is faster than, say, Rust.

The speed of the Rust compiler has little to do with generics and everything to do with LLVM and its optimizations and code generation. (Run with -Z time-passes if you don't believe me.)

I don't know if it has to do with generics, but you can't blame it on LLVM.

    /usr/src/rust/src/libsyntax % time /usr/src/rust/x/x86_64-apple-darwin/stage2/bin/rustc lib.rs -o /tmp/x.dylib --crate-type dylib -C prefer-dynamic -Z time-passes
    [most output removed...]
    time: 6.186 s	type checking
    time: 5.084 s	translation
    time: 7.221 s	LLVM passes
    /usr/src/rust/x/x86_64-apple-darwin/stage2/bin/rustc lib.rs -o /tmp/x.dylib    22.44s user 0.76s system 99% cpu 23.301 total
First of all: no more than 1/3 to 1/2 of the time is spent in LLVM in this particular example. Okay, that's cheating because there is no -O, as I'm guessing your statement supposed, but 22 seconds is already a huge amount of time to compile 30,000 lines of code, so unoptimized builds are relevant to claims that rustc is slow. The other points apply regardless of optimization setting.

Second: rustc will take this long every time to recompile libsyntax every time anything in it changes. If this were written in C, most changes would only require recompiling one source file, even though, again, header files are not treated well (something that Rust does not need to replicate). In practice this means that typical latency between changing something and seeing the output in a C/C++ program is an order of magnitude faster.

Third: The same separation that makes incremental compilation work in C/C++ allows parallelism (make -j) in full builds. rustc uses only one core per crate. Again, headers reduce the gains but Rust doesn't have that problem.

Fourth: If we compare to C rather than C++, we're off by sometimes an order of magnitude regardless of parallelism. Here is some random C program (Apple as), a total of 26929 lines, compiling in 0.90 seconds:

    clang -o foo -I ../include -I ../include/gnu -I. -DNeXT_MOD -DI386 -Di486      0.73s user 0.16s system 98% cpu 0.904 total
(With optimizations it is 2.426 seconds.)

Or libpng, 32433 lines in 0.83 seconds:

    clang -o png *.c -I. -lz  0.70s user 0.12s system 98% cpu 0.831 total
With the Linux kernel, compiled without -j and at -O1 using GCC (both of which make it slower), it's not an order of magnitude off, but it's still significantly faster than Rust: make ARCH=arm zImage 540.25s user 73.71s system 96% cpu 10:33.81 total

It compiled 1572713 lines of code; normalizing to 30,000 gives us about 12 seconds.

On the Rust side, linking libstd, about the same size, takes 6 seconds, but librustc, three times the size, took 216 seconds (10 times as long as libsyntax) to get to linking. So it varies, but I guess libsyntax is representative.

I do not know enough to have a definitive opinion of why this difference exists, but judging from the difference between C and C++, I wouldn't be surprised if it were related to Rust's complexity, which includes generics.

> First of all: no more than 1/3 to 1/2 of the time is spent in LLVM in this particular example.

"translation" is mostly LLVM (in particular, allocation of LLVM data structures).

> I do not know enough to have a definitive opinion of why this difference exists, but judging from the difference between C and C++, I wouldn't be surprised if it were related to Rust's complexity, which includes generics.

If you look at the amount of code in a typical large Rust program that is related to generic instantiations, then it's pretty miniscule (less than 10%).

Typechecking is known to be slow because the way we do method lookups is not well optimized. I don't believe that this is a fundamental issue; it's more that nobody has gotten around to improving it yet.

I don't think the rust compiler has been particularly extensively optimised at this point. There's no blindingly obvious hotspots (last I profiled the biggest single time sink was hashmap lookups in type checking), but there's not been any real effort in reducing compile time yet, as far as I know.
Based on this logic I assume you code in nothing but machine code? After all even assembler is an abstraction and therefore must have a cost. Heaven forbid you should do something as extravagant as use C for something, that level of abstraction (all those long jumps and stack manipulations, oh my) must positively destroy performance. Do you also happen to program using the gentle flapping of butterfly wings to disturb air currents and redirect cosmic rays?
The rust compiler is actually really quick. The LLVM optimization passes and code gen is where most time is spent (this is after monomorphisation of generics). The Go compiler does very few optimizations compared to LLVM, which leads to faster build times but slower code.
Last I checked compilation time wasn't really a thing a ton of folks worry about (myself included). Faster machines and reasonably better compilers have mostly solved this problem. I'd take zero-overhead generics over a slightly faster compiler any day of the week.
(incremental) compilation needs to be fast enough. A too-slow build cycle can really kill productivity, both in trying out new ideas for code and in debugging. The build times of the rust compiler are pretty close to the limit of what I'll tolerate, and I would really love to have compilation speeds similar to go.
Speak for yourself - slow compilation drives me crazy, as it increases the latency between making a change and seeing the result. Not that I believe that generics require slow compilation.
Fast build time is explicitly something Go was built for, to fix 40 minute compile times.
The dmd compiler for D compiles faster than gc and yet supports compile-time templates.