Hacker News new | ask | show | jobs
by cheald 4861 days ago
Kind of a poorly-named deck. It's really about why programs use features of these languages that end up causing poor performance relative to C, rather than why the individual VMs themselves are slow. It's no surprise that trading the byte-precision of C for the convenience of a garbage collector and heap-allocated data structures results in a performance decrease.

Dynamically-typed languages are often easier to program in, but require more copying (and memory allocation) as a result. Hash tables are heap-allocated and have to be garbage collected, but they're flexible - something you don't get with structs. Allocating and freeing memory has a cost, and that can add up quickly. Your primary line of optimization in most of these languages is "avoid the GC", which really boils down to "don't allocate more than you need to", which is sound advice in every language, scripting or otherwise.

4 comments

Did you read the deck? The GC isn't the problem; it's layout and management of allocations that's the problem, whether you use a garbage collector or explicit deallocation to clean up the resulting mess.

I think the idea that GC is what slows down dynamic languages has to be the most prevalent misconception about language performance.

Yeah. I dunno about "most prevalent", though. TFA contained two "lame excuses", both of which are things I believed to be causes of slowness in dynamic languages, and now I have to reconsider.

    dynamic typing prevents type-based optimization
    monkey patching prevents optimization
I think the most common complaint I hear about GC is not that it affects computational throughput, but that it affects _predictability_ of computational throughput. One maybe doesn't care in scientific computing, but game developers are always going on about how they can't use a GC language because a stall mid-frame will knock them over 16ms/frame or 33ms/frame, which for console certification is a project-killer.
It's worth noting that WoW uses Lua; at some point Blizzard switched from Lua 5.0 (with used a stop-the-world GC) to 5.1 (which introduced an incremental GC) specifically because of this problem. Before, UI code could generate excess tables, kick in the GC, and dramatically impact your framerate. The incremental GC significantly helped this, since it permits the GC to be run over multiple frames, reducing or even eliminating the perceptible impact on the user experience.
I did, and I think I might have mis-stated my point. GC thrash is a symptom of the problem, not the core problem itself. Manually managing allocations avoids the "resulting mess" that must be cleaned up from, which is where your big speed boost comes from. In general, the higher up you go, the further abstracted away from memory management you become. It's not that GC is inherently slow or anything, but simply that giving up control of where and how memory is allocated (in exchange for a more flexible language) is the reason for the speed difference.
It's not a given that GC is slower than manual memory management. See, for example, this work:

http://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

In a copy collector the GC time is proportional to the amount of live memory -- garbage is free. In FP-style programs (lots of short-lived allocations) GC can be essentially free.

The other main source of slowdown is to do with boxing and type checks. Accessing all data in a naive implementation of a dynamically typed language involves at least one type check and typically one pointer indirection to unbox the data. This can kill performance relative to the unboxed and unchecked equivalent. Consider, e.g., floating point operations -- they run in 1 cycle. If you add type check (2-3 cycles perhaps) and pointer indirection (10 cycles if it's in the cache) you can see how massive slowdowns easily arise.

Modern JS VMs will remove most of this cost. I doubt Python and Ruby do.

Basically, I would say it really depends on the interactions between the program and the language implementation.

> Modern JS VMs will remove most of this cost. I doubt Python and Ruby do.

In the python case, pypy removes this cost whenever it can. The less you use dynamic features like duck typing, the faster your code gets.

If the deck is to be believed, it's not the garbage collector or the heap that's causing the performance loss, it's that APIs and algorithms they enable are allocation-heavy compared to other "faster" languages.

Heap allocations are expensive even in non-GC languages.

The first example is lookup based on name, you can get away from that in python with slots but the common consensus is that it's not worth the performance improvement. That somehow translates into a dogma like, "Never use slots," but sometimes it is worth it.
> The first example is lookup based on name, you can get away from that in python with slots

You can get away from that with a sufficiently smart JIT turning a stable access to a known object into little more than a vtable dispatch. That's close to what V8 does with hidden classes on full cache hits (object map — its "hidden class" — + object "array" property both cached).

    point.x
will generate the assembly:

    cmp [ebx,<hidden class offset>],<cached hidden class>
    jne <inline cache miss>
    mov eax,[ebx, <cached x offset>]
(where ebx is the previously loaded `point`)
Which PyPy does, so __slots__ is entirely useless.
pypy is great, but I and others used swig and am stuck now with maintaining that stuff. There is also memory benefit to slots for when there are many instances. Sometimes you just have to do gross stuff so you can get on to the next thing, it's okay when everybody understands it enough.
The memory benefit also exists on PyPy without slots.

It may be possible to write a cffi backend for swig, but it's likely quite hard.

I'm certain you could have a language that is easy to program in, but doesn't use so much copying or use dictionaries so much. I'm not sure such a thing exists, though.
There's a whole continuum of stuff between Python/Ruby/JS and C, like Go, OCaml, and Rust. There's still a lot of refinement that could be done, though.