Hacker News new | ask | show | jobs
by haberman 4893 days ago
> Let's be honest: C is no "closer to the metal" than other high level languages

This is dead wrong, and your links do not support it. This is exactly the kind of statement that gets me grumpy.

Your link illustrates that an aggressive C optimizer can collapse a chunk of C code down to something smaller and simpler than the original code. This is true.

But what you said is that C is "no closer to the metal" than other high-level languages. Let's examine this assumption.

Take this C function:

  int plus2(int x) { return x + 2; }
You can compile this down into the following machine code on x86-64, which fully implements the function for all possible inputs and needs no supporting runtime of any kind:

  lea    eax,[rdi+0x2]
  ret
Now take the equivalent function in Python:

  def plus2(x):
    return x + 2
In CPython this compiles down to the following byte code:

  3           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (2)
              6 BINARY_ADD          
              7 RETURN_VALUE
Notice this is byte code and not machine code. Now suppose we wanted to compile this into machine code, could we get something out of it that looks like the assembly from our C function above? After all, you are claiming that C is "no closer to the metal" than other languages, so surely this must be possible?

The tricky part here is that BINARY_ADD opcode. BINARY_ADD has to handle the case where "x" is an object that implements an overloaded operator __add__(). And if it does, what then? Surely just a very few instructions of machine code will handle this case, if C is "no closer to the metal" than Python?

Well __add__() can be arbitrary Python code, so the only way you can implement this BINARY_ADD opcode is to implement an entire Python interpreter that runs __add__() in the overloaded operator case. And the Python interpreter is tens of thousands of lines of code in... C.

The end result is that writing the same function in C and Python is the difference between two machine code instructions and implementing an entire interpreter.

This is why I get grumpy when people deny that C is any different than other high-level languages. While this is a somewhat extreme case, you could make a similar argument about most operations that happen in other high-level languages; similar constructs will very frequently have less inherent cost in C.

5 comments

Seconded. I'd add that languages like Python hide an enormous amount of not-close-to-the-metal-ness in every single statement, because every single statement has the implicit context "Interpreter, please interpret this string relative to your potentially complex internal state".

Haskell is only slightly less prone to this, despite being compiled, since every expression by default becomes a request to instantiate a thunk in the runtime's evaluation tree. Yes, you can contort yourself to avoid this, but at that point you are imperatively programming an expression tree evaluator. This can be fun and rewarding, but it's not that different from scripting a Python interpreter, and it's certainly much further from the metal than C.

Could you explain your second paragraph in a bit more detail? I feel you have honed in on a deep and significant pont, but my passing understanding of Haskell isn't allowing me to fully understand it.

In particular, what would it look like if you were avoiding instantiating thunks?

Very crudely (Haskell experts please chime in!):

Every Haskell program is a lazily-evaluated tree of expressions; the program output itself is the root node and links represent data flow dependencies. The Haskell compiler turns source code into an initial representation of this tree as well as machine code that performs a sequence of reduction steps to turn the tree into a final output value. So a tree node at any given time is either a concrete value, or a chunk of code that can be executed to obtain that value (a thunk).

The compiler has the right to choose an arbitrary (semantics-preserving) node evaluation sequence to reduce the tree down to the final result.

Through strictness annotations (http://www.haskell.org/haskellwiki/Performance/Strictness#Ev...) you can force particular subtrees to be fully evaluated whenever-in-program-execution their parent node begins evaluation; to programmers in other languages this seems alien, but in Haskell you, by default, turn control of operation sequencing to the compiler.

Such annotations are sometimes desirable because the thunk representation of, say, a large chain of arithmetic operations on a list of numbers can be very expensive in memory, so one might want to force it to be reduced down to a concrete double as soon as possible (see, e.g., http://benchmarksgame.alioth.debian.org/u32/program.php?test..., and note all the areas where function arguments are preceded by exclamation points; those are strictness annotations).

Strictness annotations are often important for getting good performance out of certain kinds of Haskell code, and are in some cases not just important but essential for controlling memory consumption.

It can be very hard for non-experts to reason about where and why to use strictness, and I submit that this is because you're essentially doing an indirect second level of programming. The un-annotated source code describes the value-transformation semantics, while the strictness annotations are a side-band language for controlling the tree-evaluator that is embedded into your compiled program.

Hope that helps :)

Thanks for writing this up!
While a fan of your original comment, I think the logic got a little flaky here. Claiming any language is 'close to the metal' is to make implicit reference to particular implementations of said language. In recent years there has been an explosion in the use of techniques that have levelled this old distinction.

For example today it is quite possible to have Javascript OO code that runs more efficiently than, say, calling through structs of function pointers in C, since modern compilers have morphed into such strange runtime beasts that they can make (and test at runtime for) stronger assumptions about the contents of those pointers than a traditional compiler ever could. Where an indirect call in a traditional C compiler will only ever be at best "CALL [rax]" (assuming no indirection required to implement dynamic linking), newer implementations might inline the called function entirely, all dependent on what input the program happens to have been fed on a particular run.

This trend is why I'm not certain of the odds for less expressive languages in medium term. Instead of a struct of function pointers, imagine an inheritance chain of structs of function pointers. Although it can be implemented in C it is a native concept in Javascript, and one that compilers can already optimize. While the JS compiler has explicit information fundamental to the language to infer this structure from (and version any dependent objects to implement a runtime guard), we aren't nearly at the point where our compilers can deduce such an idiom in C or assembly, optimize for it, or solve a huge satisfiability problem at runtime in order to produce guards for it.

It naturally follows that the more semantic information a compiler has available to it, the better chance, at least in the medium term (10-20 years?) it has of applying tractable optimizations.

This sounds like the Sufficiently Smart Compiler line of reasoning that was supposed to make Itanium the architecture of the future and has been forecast to make high-level languages faster than C any day now. In practice, the extra costs imposed by high-level languages (and in particular, the loss of memory management control) have outweighed the benefit of extra high-level information consistently for decades. And I see no reason why this will change.

In particular, the the optimizing VMs you are describing still have to be written in something. The proof is in the pudding: no serious language runtime is written in anything other than C or C++. The argument that C is being displaced will be an empty one until you start to see language runtimes being written in something else.

(Just to clarify, since another commenter missed this distinction earlier: I'm talking specifically about VMs/runtimes, not ahead-of-time compilers. Compilers are frequently written in non-C languages because an ahead-of-time transformation doesn't have the same stringent efficiency and resource usage requirements that language runtimes do).

The equivalent `plus2` OCaml function compiles to:

    camlAdd__plus2_1030:
    .L100:
    	addq	$4, %rax
    	ret
(It's using 4 instead of 2, because ints are boxed; a 0 in the last bit denotes an int, a 1 denotes an address).
Part of the point of the original article is that not even assembler is "close to the metal" any more. How long does that fragment of assembly code take to execute? Depends on whether the instructions are in the I-cache, whether some previous branch prediction has failed, and whether the data are in the cache. All this adds up to a couple of orders of magnitude.
That's cool and OCaml is an interesting language. However I'm sure you know it would not be difficult to come up with a different example where C lets you express "bare metal" idioms that are not as "bare metal" when expressed in OCaml.
> Now take the equivalent function in Python

Those functions maybe similar but they are not equivalent.

Apples and oranges yet again. The original article is about Haskell; an entirely different beast from Python. Haskell compiles down to machine code, not byte code.
I was responding to a comment that said that "C is no closer to the metal than other high level languages", without qualifying which high level languages.

But even Haskell is not as "bare metal" as C. Just because a language can compile to machine code doesn't mean it fills the same role that C does.