Hacker News new | ask | show | jobs
by eric_bullington 5003 days ago
Is that a blanket statement about C versus any dynamic language, or just JavaScript? Because with LuaJIT2, Mike Pall has already shown that a highly-optimized Lua interpreter can hold its own against optimized C, and sometimes even surpass it.

Here's a nice look at some of the magic he does in assembler to get LuaJIT at such a high level of performance:

http://article.gmane.org/gmane.comp.lang.lua.general/75426

My point is, there's nothing inherently unbeatable about C. I think that a select group of dynamic language -- probably including JavaScript -- can and will reach that level of performance, or at least close to it. They just haven't been around as long.

1 comments

Yeah, well please tell me how? The JavaScript engine is written in C so how could the language running on top of it be faster?

I will tell you how it can be - if the JavaScript engine JITs the JS in a more optimized way than the C compiler compiled the C.

Well how does that make JS faster than C when JS will always carry more overhead? It cannot and does not - all it means is that the compiler was far better.

A poor C compiler may be outperformed by the best JS one, but that is about the only scenario where JS will outperform C.

This argument reminds me of when I was 15 and trying to explain Unix security to my democoder friend, who kept insisting that he could bypass file permissions by just programming directly to the hardware and reading the bits off the disk.

If you want to keep arguing that every other language merely asymptotically approaches the performance of x86_64 assembly, that is a fine (if dumb) position. But the argument that C has a monopoly on assembly code generation is silly, as is the argument that C compilers are inherently better at code generation than all other compilers. Look at the LuaJIT2 direct-threading post upthread for an infamous example of a performant dispatching structure that isn't easily expressible in ANSI C.

> Look at the LuaJIT2 direct-threading post upthread for an infamous example of a performant dispatching structure that isn't easily expressible in ANSI C.

Sure, but the point of that post is that hand-written assembly can beat C-compiler-generated assembly. It's not arguing that any other language's runtime (even LuaJIT's own code generator) is going to generate assembly that's any better than gcc's in this case.

I do know that one of Mike Pall's stated goals with LuaJIT is to minimize the performance difference between Lua and C as much as possible. Though one of his methods of doing so (the foreign function interface: http://luajit.org/ext_ffi_tutorial.html) gives up the memory-safety of Lua to accomplish this. Even without FFI, LuaJIT is incredibly fast for functional/numerical calculations.

> If you want to keep arguing that every other language merely asymptotically approaches the performance of x86_64 assembly, that is a fine (if dumb) position.

I think what rubs some people (including me) the wrong way about these "faster than C" claims is the implied argument that C (and other non-GC'd languages) are obsolete. These claims often come from people who, as this article's author explicit states, want higher-level languages to win because they don't want to ever have to write in C or deal with anything written in C.

If people can get stuff in higher-level languages to run faster, then great, but those of us working in system-space in non-memory-safe languages are still going to beat their pants off if we combine all the tricks of their VM with domain-specific knowledge and optimizations (specifically parsing, in this case). "Faster than C" claims can sometimes feel like a big "F U" to those of us trying to innovate in system-space, like "our VM is the only system software we'll ever need." It's the Java attitude where people only want "100% pure Java" and shun JNI and anything that won't run on their VM.

Why would it be hard to generate the code for a direct-threaded interpreter?
That's exactly the question that Mike's whole post is answering. It turns out that the common optimizations like hoisting, CSE, tail-merging, etc. perform badly on direct-threaded interpreter code. Which is one of the primary reasons Mike wrote the interpreter for LuaJIT2 in hand-written assembly instead of C.
I think it is easier for the compiler for a high-level language to predict usage patterns --- which branches will be taken, which variables should be in regs or L1, which functions are best served by slower code with a small memory footprint vs. faster code with a larger footprint, what's best handled with a jump table versus with conditional jumps, &c &c --- that can a C compiler, and that over the long run languages with better, more consistent abstractions than C are going to have better runtime characteristics than the outputs of a C compiler.

Put differently: C compilers are hamstrung by all the dumb C programs a developer could express with C.

Javascript compilers have to deal with bad Javascript, don't get me wrong, but they don't have to deal so much with bad list and table implementations.

I get what the LuaJIT2 post is saying (and let's be clear: I'm not putting myself on a level with the Lua JIT people).

For what it's worth: C is my first language, and my favorite language. My background is systems programming. I would not enjoy figuring out how to reliably write a memory-mapped CSR while masking the right set of interrupts in Javascript. :)

The Lua people are not like the Java people. The whole point of the luajit ffi is to make interfacing with C code really easy, without any more function call overhead than calling C to C. So it is the ideal language to interface with performing well written C code without losing the benefits.
I don't know why I am wasting my breath either since it is not my law that abstractions (if both implementations are sound) always come with a performance tradeoff. It is just a fact (therefore considered a law), not made up by me.
The opposite is often true. The right abstractions can provide a performance benefit, when they force programs to conform to constructs that are easily expressed in performant native code.

The standard C library is full of functions that are slower than equivalent functionality in higher level languages. Or do you think ticking through arrays of hopefully-ASCII bytes, byte by byte, waiting for the 0 byte, is the fastest way to compare two strings?

> Or do you think ticking through arrays of hopefully-ASCII bytes, byte by byte, waiting for the 0 byte, is the fastest way to compare two strings?

I'm sure you must know this, but this is not even remotely how strcmp() is implemented in modern libc's.

It is quite close, actually. "not even remotely" is a very strong statement.

There are tricks that let them do this 8 bytes at a time (on AMD64, 4 bytes on x86), but that doesn't change the fact that in order to compare two 128KB strings which are equal, you actually have to read 2*128KB from memory and compare each single byte (in groups of 8, if you are lucky enough with your alignments and instruction set).

Different abstractions, such as Python's strings, can very often do this comparison with almost no memory access:

(a) if both strings are interned, it is enough to do a pointer comparison.

(b) if the length is not equal, the strings are not equal - a one word comparison.

(c) if both strings have been hashed before (quite likely), you can tell they are different if their hash is different - a one word comparison.

(d) if length is equal, and hash is (equal or uncomputed), you will have to do the comparison.

Whether this trade-off is worthy depends on your application. If most of your strings are 7-characters or less (as is often the case for software dealing with e.g. stock tickers), then the C approach on 64-bit archs wins hands down: you should actually have all the strings in-place because a pointer takes more memory and causes contention. However, if your strings tend to be 100 bytes or above, and many of them have equal prefixes, the Python approach wins hands down.

What's a modern libc? Not being snide or anything: I have no idea. I didn't know where to look so I looked at glibc [1] and that, in fact, does seem to be how it works.

[1] http://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcm...

Care to provide and example?

Your statement would only be true if the abstraction implemented something more efficiently than the C developer. This in of itself says nothing about C or X language for that matter being faster.

It could also be true if the abstraction lays bare optimizations not available (readily enough) to the C compiler. While this wouldn't make X faster than C as languages (what does that mean, anyway?), it could make an X compiler produce faster code than the best C compiler.
Sure: interned strings.
You are wrong. A JITted environment can take advantage of knowing about the current run of the system. As a result, it can do things like unroll loops and inline function calls.

Yes, you could do the same thing in C code, but the resulting executable would be unreasonably large because you would have to unroll every loop and inline every function.

Pypy has show itself faster than C in some cases as well[1].

1. http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-a...

That PyPy example has nothing to do with C and everything to do with a particular implementation of libc.
Maybe some day they will add profile guided optimization to C compilers.
They're not the same. VMs can assume invariants (which may not always hold) and compile specialized methods which depend on those assumptions and then fall back to a non-optimized version during the execution of a method. To do something like that (OSR) in C, you'll have to have a very sophisticated runtime.
you're kidding right? and at least referring to gcc's -fprofile-generate -fprofile-use right?
Profile-guided optimization is decades old. It was, for example, one of the things people loved about the (Ultrix?) DEC Alpha compiler was profile-driven optimization.
First VMS.

That was one of the outstanding compilers ever written.

There is a major difference between performing profile-generated optimizations at compile-time based on sample data, and performing it at run-time based on real data. It is one pretty good way in which JITs could beat programs implemented directly in C.
Not really since the 'sample' data is 'real' data which has been gathered during run-time.
It's not all about the quality of the compiler. C compilers have to deal with a language that has terrible semantics for aliasing, and in which cross module information flow is usually limited to what programmers type into header files (although this is slowly beginning to change with LTO and WPO). This is a significant disadvantage for an optimizing compiler. In contrast, JS implementations have the entire program text and can optimise, say, calling conventions however they like.

JS and Lua also have disadvantages, such as their inferior (but memory safe) representation of data, so I would not be surprised if JS were faster for some problems and slower for others.

That's why Luajit also has an unsafe C compatible representation too, the ffi.
Your argument is the same as the arguments that higher level arguments cannot be faster than assembly.

In theory, any optimization available to a compiler is also available to an assembly programmer. In practice humans do not match automation. This has been demonstrated both in theory and practice for decades.

There is no limit to the level of language that this argument could theoretically apply to. In practice the run-time guarantees and programmatic indirection that we demand in high level languages make the task insurmountable for current programs. But never underestimate what can happen with a good compiler and the right program.

> "In practice humans do not match automation." Actually in practice human's almost always surpass automation mostly because they can start with automated output and improve upon it.
Humans can do that. But in practice we don't seem to.

Instead the trend mostly seems to be that we do it by hand until automation becomes good enough to do it better than we were doing it, and then we move on to harder kinds of problems and let automation do its thing.

I say "mostly seems to be" because there are plenty of counter-examples. But that's the general trend. As is seen by the explosion of programmers working in what used to be considered unbearably slow scripting languages, and the relative lack of people doing traditional assembly programming.

"In practice humans do not match automation." Maybe this is a nitpicky argument, but to me this means one thing: when put head to head, humans do not do better than automated output. This is wrong for exactly the reason I said. In 99% of cases you start out with an unoptimized version, find the areas that need optimizing, and improve the automated output by hand. Thus in practice and in theory humans almost always beat automated output. The general trend of people working on higher level problems isn't because automation can do low level stuff better, it's because automation can do it well enough for most cases that we don't need a human to step in. Automation isn't doing better than what a human would do, but it is doing well enough that there's no need to go back and optimize it.
In 99% of cases you start out with an unoptimized version, find the areas that need optimizing, and improve the automated output by hand.

This strongly depends on the scenario.

If people are identifying and optimizing a hot spot, it tends to happen this way, yes.

But what I am thinking about is the common situation where a company produces a product in a low level language, and then later on a competitor produces a competing product in a high level language. The second entrant is optimized by an automated process, the first was done by hand, and can't easily be rewritten.

In these situations - and I have encountered several - the second implementation frequently winds up doing more and doing it faster than the first implementation.

> I will tell you how it can be - if the JavaScript engine JITs the JS in a more optimized way than the C compiler compiled the C.

Yes. Exactly. Let's say there are two languages A and B, and A is compiled (be it JIT or AOT) by a smarter compiler, resulting in faster code than the same program written in language B, how does that not make language A faster?

I do know that comparing interpreted/JITed code and AOT-compiled code is somewhat nonsensical, but then again, so is talking about "faster" and "slower" languages.

>Yes. Exactly. Let's say there are two languages A and B, and A is compiled (be it JIT or AOT) by a smarter compiler, resulting in faster code than the same program written in language B, how does that not make language A faster?

Well I am talking about the performance limitations imposed by the laws of physics. Not which compiler is better. I can always find a better or worse C or JS compiler/JITter so that proves nothing.

Doing more (Which a more highly abstracted languages must do) in less time with all else being the same is not possible as we understand quantum physics today.

> Doing more (Which a more highly abstracted languages must do)

Can you explain why you think this assumption is true? I don't think it is.

A higher level of abstraction may allow your to convey more information in less code, but that does not mean it necessarily translates to more machine code.

In fact, when you operate on a higher level of abstraction with more information available, the compiler may even have more freedom to produce highly efficient output without breaking expected semantics.

Wow.
It's not like performance is a fundamental property of the language, it's just a matter of which tools are available. JITs outperforming compilers is just a matter of specialized compilation approach being able to outperform a generalized compilation approach, which is actually a far more fundamental truth than abstractions having a performance penalty. LLVM outperforms GCC in some cases.

However, in general JITs do not outperform GCC. The JIT overhead only pays off in a limited set of cases.

Most people publishing performance benchmarks on the Internet barely know how to compile a C program for production, let alone write or benchmark one. Google did a thorough benchmark of several languages and C++ was the clear winner: https://days2011.scala-lang.org/sites/days2011/files/ws3-1-H...

>> Google did a thorough benchmark... <<

No. One person employed at Google did a comparison and lots of other people (including some others employed at Google) explained what they thought was wrong with that comparison.

In the abstract, a just-in-time compiler will beat an ahead-of-time compiler because it has more information, including statistics about control flow (which branches tend to be taken, etc.) and what exact instruction set is being targeted.