|
|
|
|
|
by nsajko
1366 days ago
|
|
When people complain about "negative performance impact of GC", often they're actually bothered by badly designed languages like Java that force heap-allocation of almost everything. I think this might have been fixed in latest versions of Java, though, not sure if value types are already in the language or just coming soon. Aside from that, it's my understanding that GC can be both a blessing and a curse for performance (throughput), that is, an advanced-enough GC implementation should (theoretically?) be faster than manual memory management. |
|
There are a few different ways a GC impacts code performance. First, even low-latency GCs have a latency similar to a blocking disk op or worse on modern hardware. In high-performance systems we avoid blocking disk ops entirely specifically because it causes a significant loss in throughput, instead using io_submit/io_uring. Worse, we have limited control over when a GC occurs; at least with blocking disk ops we can often defer them until a convenient time. To fit within these processing models, worst case GC latency would need to be much closer to microseconds.
Second, a GC operation tends to thrash the CPU cache, the contents of which were carefully orchestrated by the process to maximize throughput before being interrupted. This is part of the reason high-performance software avoids context-switching at all costs (see also: thread-per-core software architecture). It is also an important and under-appreciated aspect of disk cache replacement algorithms, for example; an algorithm that avoids thrashing the CPU cache can have a higher overall performance than an algorithm that has a higher cache hit rate.
Lastly, when there is a large stall (e.g. a millisecond) in the processing pipeline outside the control of the process, the effects of that propagate through the rest of the system. It become very difficult to guarantee robust behaviors, safety, or resource bounds when code can stop running at arbitrary points in time. While the GC is happening, finite queues are filling up. Protecting against this requires conservative architectures that leave a lot of performance on the table. If all non-deterministic behavior is asynchronous, we can optimize away many things that can never happen.
A lot of modern performance comes down to exquisite orchestration, scheduling, and timing in complex processes. A GC is like a giant, slow chaos monkey that randomly destroys the choreography that was so carefully created to produce that high-performance.