Hacker News new | ask | show | jobs
by Xurinos 4893 days ago
I guess I will be your essay-writing dissenter.

C is a high-level language with fewer features than many other languages, is not necessarily the engine behind other languages, has the problem of its programs poorly implementing a percentage of what other high level languages are capable of doing quickly and securely, and provides slow and troublesome memory allocation out-of-the-box. When comparing the speed of operations, people are rarely comparing apples to apples. And contrary to what is spattered on the boards, C is not an understandable, close-to-the-metal wrapper around assembly instructions (compilers have advanced quite a bit).

I love C. It does feel fast, and I get the illusion of being close to the metal. It was one of my first languages, holding a sentimental place in my heart. Very important things are written in it. It is a high-level language with some okay abstractions.

Is it underneath other high level languages? Maybe, if you mean that the compiler might be written in C in order to bootstrap the language. Of course, one could write the compiler in any language; it's all about translating programmer-friendly symbols into assembly or VM bytecode, right? And speed of compile is a different subject from speed of the compiled program.

But here is the gotcha on raw performance: Your large C program poorly implements a percentage of what other high level languages are capable of doing quickly and securely.

I once foolishly argued in favor of C's performance, saying that one could write a layer that supports all these nice features speedily, such as the data structures I will mention below as well as GC; by the time you do that, you might as well be using a different language. You probably implemented that layer poorly, compared to other languages with large communities pounding at and optimizing that layer. For example, when you implemented your "fast" list with the basic struct and next pointer, did you also implement the new-node creation in such a way as it still uses raw malloc(), as opposed to managing previously-malloced memory efficiently?

How many implementations of a basic list do we need in C? Super large integers? Fixed-point integers? Growable arrays? Lazy/infinite lists? Trees? Hash maps? Surely you don't think these other language designers said to themselves, "Let's support hash maps and make them slow." No, they came up with a fast standard, supported by their language, sometimes complete with various configuration options to make all the tradeoff decisions on making those data structures speed-efficient or memory-efficient for reads or writes. Others, of course, subscribe to a religion, er, a specific tradeoff, such as perl's approach to {}s ("There's more than one way to do it" ... unless you are dealing with hash tables).

What about all the wonderful memory management you can do in C? Aren't you closer to the metal that way, able to make basic memory allocation super speedy? Not really. This is part of the illusion. malloc() is slow enough that developers have rewritten versions of it several times. ROM-based MUDs, for example, manage their own memory, using an initial malloc, of course, but regularly using their own set of allocators and deallocators (free_string, str_dup, etc) on top of that allocation. There are these tricks and more in high level languages, including the sharing of partial structures (kinda like union but with more pointers and fewer bugs associated with those pointers), allowing for resource allocation strategies that can be "faster than C".

If the argument in favor of C's speed at the end of the day is, "When we write crappy programs with buffer overrun holes, memory leaks, and no error handling, it's super fast!", we are (1) not comparing apples to apples and (2) doing ourselves and our customers a grave disservice.

Let's be honest: C is no "closer to the metal" than other high level languages (http://news.ycombinator.com/item?id=3753530 and https://en.wikipedia.org/wiki/Low-level_programming_language...). The days of manually XORing to assign 0 to a variable are well behind us.

Note... I don't mean all high-level languages. There are many very slow implementations of these languages. Programmers are getting better at this stuff in modern implementations, gcc included. And it is fair to say that there are many things other languages can do that are, when you compare apples to apples, faster than C, especially when you factor in modern JIT compilation; and they also do some things slower than a similar function in C.

No, C isn't and won't be obsolete, not until people write popular OSes in other human-readable languages, complete with a body of excellent libraries. We operate in a world of legacy, working code.

Edit: Looks like we both must have not read the article before replying. The author goes over many of these points.

3 comments

> Let's be honest: C is no "closer to the metal" than other high level languages

This is dead wrong, and your links do not support it. This is exactly the kind of statement that gets me grumpy.

Your link illustrates that an aggressive C optimizer can collapse a chunk of C code down to something smaller and simpler than the original code. This is true.

But what you said is that C is "no closer to the metal" than other high-level languages. Let's examine this assumption.

Take this C function:

  int plus2(int x) { return x + 2; }
You can compile this down into the following machine code on x86-64, which fully implements the function for all possible inputs and needs no supporting runtime of any kind:

  lea    eax,[rdi+0x2]
  ret
Now take the equivalent function in Python:

  def plus2(x):
    return x + 2
In CPython this compiles down to the following byte code:

  3           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (2)
              6 BINARY_ADD          
              7 RETURN_VALUE
Notice this is byte code and not machine code. Now suppose we wanted to compile this into machine code, could we get something out of it that looks like the assembly from our C function above? After all, you are claiming that C is "no closer to the metal" than other languages, so surely this must be possible?

The tricky part here is that BINARY_ADD opcode. BINARY_ADD has to handle the case where "x" is an object that implements an overloaded operator __add__(). And if it does, what then? Surely just a very few instructions of machine code will handle this case, if C is "no closer to the metal" than Python?

Well __add__() can be arbitrary Python code, so the only way you can implement this BINARY_ADD opcode is to implement an entire Python interpreter that runs __add__() in the overloaded operator case. And the Python interpreter is tens of thousands of lines of code in... C.

The end result is that writing the same function in C and Python is the difference between two machine code instructions and implementing an entire interpreter.

This is why I get grumpy when people deny that C is any different than other high-level languages. While this is a somewhat extreme case, you could make a similar argument about most operations that happen in other high-level languages; similar constructs will very frequently have less inherent cost in C.

Seconded. I'd add that languages like Python hide an enormous amount of not-close-to-the-metal-ness in every single statement, because every single statement has the implicit context "Interpreter, please interpret this string relative to your potentially complex internal state".

Haskell is only slightly less prone to this, despite being compiled, since every expression by default becomes a request to instantiate a thunk in the runtime's evaluation tree. Yes, you can contort yourself to avoid this, but at that point you are imperatively programming an expression tree evaluator. This can be fun and rewarding, but it's not that different from scripting a Python interpreter, and it's certainly much further from the metal than C.

Could you explain your second paragraph in a bit more detail? I feel you have honed in on a deep and significant pont, but my passing understanding of Haskell isn't allowing me to fully understand it.

In particular, what would it look like if you were avoiding instantiating thunks?

Very crudely (Haskell experts please chime in!):

Every Haskell program is a lazily-evaluated tree of expressions; the program output itself is the root node and links represent data flow dependencies. The Haskell compiler turns source code into an initial representation of this tree as well as machine code that performs a sequence of reduction steps to turn the tree into a final output value. So a tree node at any given time is either a concrete value, or a chunk of code that can be executed to obtain that value (a thunk).

The compiler has the right to choose an arbitrary (semantics-preserving) node evaluation sequence to reduce the tree down to the final result.

Through strictness annotations (http://www.haskell.org/haskellwiki/Performance/Strictness#Ev...) you can force particular subtrees to be fully evaluated whenever-in-program-execution their parent node begins evaluation; to programmers in other languages this seems alien, but in Haskell you, by default, turn control of operation sequencing to the compiler.

Such annotations are sometimes desirable because the thunk representation of, say, a large chain of arithmetic operations on a list of numbers can be very expensive in memory, so one might want to force it to be reduced down to a concrete double as soon as possible (see, e.g., http://benchmarksgame.alioth.debian.org/u32/program.php?test..., and note all the areas where function arguments are preceded by exclamation points; those are strictness annotations).

Strictness annotations are often important for getting good performance out of certain kinds of Haskell code, and are in some cases not just important but essential for controlling memory consumption.

It can be very hard for non-experts to reason about where and why to use strictness, and I submit that this is because you're essentially doing an indirect second level of programming. The un-annotated source code describes the value-transformation semantics, while the strictness annotations are a side-band language for controlling the tree-evaluator that is embedded into your compiled program.

Hope that helps :)

Thanks for writing this up!
While a fan of your original comment, I think the logic got a little flaky here. Claiming any language is 'close to the metal' is to make implicit reference to particular implementations of said language. In recent years there has been an explosion in the use of techniques that have levelled this old distinction.

For example today it is quite possible to have Javascript OO code that runs more efficiently than, say, calling through structs of function pointers in C, since modern compilers have morphed into such strange runtime beasts that they can make (and test at runtime for) stronger assumptions about the contents of those pointers than a traditional compiler ever could. Where an indirect call in a traditional C compiler will only ever be at best "CALL [rax]" (assuming no indirection required to implement dynamic linking), newer implementations might inline the called function entirely, all dependent on what input the program happens to have been fed on a particular run.

This trend is why I'm not certain of the odds for less expressive languages in medium term. Instead of a struct of function pointers, imagine an inheritance chain of structs of function pointers. Although it can be implemented in C it is a native concept in Javascript, and one that compilers can already optimize. While the JS compiler has explicit information fundamental to the language to infer this structure from (and version any dependent objects to implement a runtime guard), we aren't nearly at the point where our compilers can deduce such an idiom in C or assembly, optimize for it, or solve a huge satisfiability problem at runtime in order to produce guards for it.

It naturally follows that the more semantic information a compiler has available to it, the better chance, at least in the medium term (10-20 years?) it has of applying tractable optimizations.

This sounds like the Sufficiently Smart Compiler line of reasoning that was supposed to make Itanium the architecture of the future and has been forecast to make high-level languages faster than C any day now. In practice, the extra costs imposed by high-level languages (and in particular, the loss of memory management control) have outweighed the benefit of extra high-level information consistently for decades. And I see no reason why this will change.

In particular, the the optimizing VMs you are describing still have to be written in something. The proof is in the pudding: no serious language runtime is written in anything other than C or C++. The argument that C is being displaced will be an empty one until you start to see language runtimes being written in something else.

(Just to clarify, since another commenter missed this distinction earlier: I'm talking specifically about VMs/runtimes, not ahead-of-time compilers. Compilers are frequently written in non-C languages because an ahead-of-time transformation doesn't have the same stringent efficiency and resource usage requirements that language runtimes do).

The equivalent `plus2` OCaml function compiles to:

    camlAdd__plus2_1030:
    .L100:
    	addq	$4, %rax
    	ret
(It's using 4 instead of 2, because ints are boxed; a 0 in the last bit denotes an int, a 1 denotes an address).
Part of the point of the original article is that not even assembler is "close to the metal" any more. How long does that fragment of assembly code take to execute? Depends on whether the instructions are in the I-cache, whether some previous branch prediction has failed, and whether the data are in the cache. All this adds up to a couple of orders of magnitude.
That's cool and OCaml is an interesting language. However I'm sure you know it would not be difficult to come up with a different example where C lets you express "bare metal" idioms that are not as "bare metal" when expressed in OCaml.
> Now take the equivalent function in Python

Those functions maybe similar but they are not equivalent.

Apples and oranges yet again. The original article is about Haskell; an entirely different beast from Python. Haskell compiles down to machine code, not byte code.
I was responding to a comment that said that "C is no closer to the metal than other high level languages", without qualifying which high level languages.

But even Haskell is not as "bare metal" as C. Just because a language can compile to machine code doesn't mean it fills the same role that C does.

I think it is a little unfair to say C the language is not low-level because a C application must implement features (or use libraries) that higher-level languages would already provide.

However, your older post [1] contrasting the code generated by gcc -O0 and -O1 was illuminating and made me rethink my position. Thanks! <:)

[1] http://news.ycombinator.com/item?id=3753530

Of course. That's why people should write 90% of the non-speed-limited code in something higher level and the speed critical things in C

"And contrary to what is spattered on the boards, C is not an understandable, close-to-the-metal wrapper around assembly instructions (compilers have advanced quite a bit)"

Yes, it isn't BUT it is (due to compilers supporting) the language where you can drop down to assembler or something closer to it: http://msdn.microsoft.com/en-us/library/26td21ds(v=vs.80).as... in the easiest way