| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwaway91111 3228 days ago
	How does ARC hold up for long-lived servers? Are the leaks manageable?

3 comments

coldtea 3227 days ago

What leaks? You only get leaks if you have cycles that you forgot or don't close non-memory resources that you keep referencing.

Which is not that different than with a GC.

link

kartickv 3223 days ago

No, GCs collect reference cycles. Whereas a (strong) reference cycle in ARC in an operation repeated many times in a long-running server or something adds up.

Worse, sometimes, you don't even know if you're creating a leak. For example, I recently had to call, given two gesture recognisers a and b:

a.requireToFail(b)

A is a long-lived object. B goes away when the current view controller is popped. But not if A keeps a strong reference to it. Does it? Probably no one without access to the source code of UIGestureRecognizer knows!

link

throwaway91111 3227 days ago

Exactly. How is the profiling experience?

link

breatheoften 3228 days ago

Why should ARC imply leaks?

link

throwaway91111 3227 days ago

It doesn't.

link

iainmerrick 3228 days ago

I would expect it to be more reliable than a GC, as its performance and memory usage are more consistent.

link

rbehrends 3227 days ago

That can cut both ways.

1. Swift's ARC uses atomic reference counting underneath, which is normally very expensive, and relies on compiler optimization to remove as many reference count operations as possible. This is normally pretty effective, but there are situations where it's not possible.

2. Reference counting allows for arbitrary long pauses as the result of cascading deletions (i.e. where object deletions trigger other object deletions). You can work around that (by deferring deletions), but then you don't have any guarantees about the timeliness of deletions anymore. As far as I know, this is still an open issue for Swift.

3. Without a compaction scheme, you risk memory fragmentation. While this is a rare occurrence in practice, there are workloads where it can happen.

4. Reference counting cannot reclaim cycles without a mechanism for detecting cycles; such a cycle detector (e.g. trial deletion) poses pretty much the same challenges as tracing GC.

Obviously, tracing garbage collectors pose their own challenges; my point is merely that whether performance and memory usage are more consistent has to be judged on a case by case basis.

link

iainmerrick 3227 days ago

1. You're right, it can be slow! But it's usually still consistent and that's useful.

2. Hmm, cascading deletions. Is that really a big problem in practice? I'm skeptical because it seems like that would affect C and C++ programs too, but you rarely hear anyone mention it. Maybe Swift tends to use more objects whereas C++ programmers tend to be better at packing stuff together?

3. Fragmentation -- that's true, but again, it affects C and C++ too. I guess for long-running C/C++ programs you're likely to manage memory pools directly. I don't know if that's possible in Swift.

4. Cycles -- weak references work fine for this. I have never had trouble with cyclic garbage in Objective-C. (I mean, I've had leaks, but they're always easy to spot with a leak detector and easy to fix with weak references.)

Overall, it seems to me that reference-counting adds a small but consistent performance penalty, and otherwise should have comparable runtime behavior to malloc/free in C, which is known to work pretty well when used correctly.

Note that Apple got smooth and reliable 60fps performance on the original iPhone, which was extremely resource-constrained by modern standards, using Objective-C, which isn't usually considered a fast language!

On the GC side, it seems like you typically get bursty, unpredictable performance, in both time and memory. Modern GCs work very hard to keep collection pauses as short as possible, but almost inevitably that means keeping garbage around for longer, which means using a lot of memory.

link

rbehrends 3227 days ago

1. I think you may not realize what state of the art tracing GCs can accomplish. IBM's Metronome has pause times down to hundreds of microseconds.

2. It only takes freeing a tree with a few thousand nodes for it to become an issue. It happens in C++, too (heck, there've been cases where chained destructor calls overflowed the stack [1]). The reason why you don't hear more about it is because pause times just aren't that big a deal for most applications. In forum debates, people always discuss triple A video games and OS kernels and such, but in practice, only a minority of programmers actually have to deal with something even approaching hard real time requirements. Generally, most applications optimize more for throughput rather than pause times.

3. Yes, and it can be a problem for C/C++, too. It's rare, but not non-existent. Note that pools can actually make fragmentation worse for long-running processes.

4. Weak references work if you get them right. But for long-running processes, even a single error can accumulate over time.

> On the GC side, it seems like you typically get bursty, unpredictable performance, in both time and memory. Modern GCs work very hard to keep collection pauses as short as possible, but almost inevitably that means keeping garbage around for longer, which means using a lot of memory.

This ... is not at all how garbage collectors work, especially where real time is concerned. Not even remotely. I recommend "The Garbage Collection Handbook" (the 2011 edition) for a better overview. And ultra-low pause times are generally more of an opt-in feature, because they're rarely needed.

[1] E.g. Herb Sutter's talk at C++Con 2016: https://www.youtube.com/watch?v=JfmTagWcqoE&t=16m23s

link

iainmerrick 3223 days ago

> > almost inevitably that means keeping garbage around for longer, which means using a lot of memory.

> This ... is not at all how garbage collectors work, especially where real time is concerned.

Hmm, I'm certainly no GC expert, but is it really not the case that GC tends to be memory-hungry? Not exotic academic systems, but the languages people use day-to-day.

Most of my experience with GCs is in languages like Java and C#. Java in particular can be very fast but always seems to be memory-hungry, using like 4x the memory you'd need in C++. I haven't spent a huge amount of time fine-tuning the GC settings (it seems like Oracle is working to simplify that -- good!) but the defaults seem to assume at least 2x memory usage as elbow room for the GC.

That's on the server. On mobile, I've worked with iOS and Android and iOS undeniably gets the same work done with much less memory. Flagship Android phone have 4GB of memory and need it, whereas Apple hasn't felt the need to bump up memory so quickly even after going 64-bit across the board.

The last I heard about real-time GC, with guaranteed space and time bounds, it sounded like it was theoretically solved, but not used much in practice because it was too slow. That was a number of years ago though. Has that situation changed? Are there prominent languages or systems with real-time GC?

link

iainmerrick 3223 days ago

Looking up IBM's Metronome led me to the Jikes RVM (https://en.wikipedia.org/wiki/Jikes_RVM), which sounds so cool that I wonder why it isn't being used everywhere?

The PowerPC (or ppc) and IA-32 (or Intel x86, 32-bit) instruction set architectures are supported by Jikes RVM.

Ah, no ARM and no x64, that'd be it.

What's keeping this kind of GC technology back from the mainstream?

link

throwaway91111 3227 days ago

One note, in video games allocations are a major source of slowdown; don't allocate in your inner loop! Use object pools and arena allocators.

link

rbehrends 3226 days ago

This is because naive can-do-it-all allocations in C/C++ can be expensive, not because allocations are inherently expensive. In C/C++, you have:

1. A call of a library function that typically cannot be inline.

2. Analysis of the object size in order to pick the right pool or a more general allocator to allocate from.

3. A traditional malloc() implementation needs to also use a global lock; thread-local allocators are comparatively rare.

4. For large objects, a complex first-fit/best-fit algorithm with potentially high complexity has to be used.

Modern GCs typically use a bump allocator, which is an arena allocator in all but name. In OCaml or on the JVM, an allocation is a pointer increment and comparison.

Even without bump allocators, it's easy for a GC implementation to automatically turn most allocations into pool allocations that can be inlined.

Also: much as people love to talk about video games, video games with such strict performance requirements are not only just a part of the video game industry, they are a tiny part of the software industry.

link