Hacker News new | ask | show | jobs
by stevefan1999 1119 days ago
Reference counting is slow because it has an additional increment/decrement operator on each lifetime of a scope.

Add a little bit of salt to insult you need it to be atomic if you want it to run on SMP. This means for each time you have create/release the lifetime of an object you will make a lot of memory barriers, and create a lot of cache contention.

But in practice the overhead is actually nought, and most of the time you rather deal with I/O bound problem more than an additional atomic increment operation. Modern processor is fast enough to deal with them in few cycles in around the order of 10 nanoseconds

1 comments

Swift's refcounting is atomic (as is objc's). As long as you're not under contention most benchmarks I've seen show negligible overhead (from the addition of atomicity, the refcount overhead is still there) for uncontended access. But IME if you do have many threads walking the same data structure you end up spending stupid amounts of time fighting the refcounting. This applies even if the data structure is immutable and has guaranteed lifetime as swift's type system doesn't seem to allow that to be expressed, and as a result it seems to do a lot of ref churn we'd consider unnecessary.
> Swift's refcounting is atomic (as is objc's)

Most of the time it’s possible to avoid atomic instructions and still be thread-safe. https://dl.acm.org/doi/10.1145/3243176.3243195:

“BRC is based on the observation that most objects are only accessed by a single thread, which allows most RC operations to be performed non-atomically. BRC leverages this by biasing each object towards a specific thread, and keeping two counters for each object --- one updated by the owner thread and another updated by the other threads. This allows the owner thread to perform RC operations non-atomically, while the other threads update the second counter atomically. We implement BRC in the Swift programming language runtime, and evaluate it with client and server programs. We find that BRC makes each RC operation more than twice faster in the common case. As a result, BRC reduces the average execution time of client programs by 22.5%, and boosts the average throughput of server programs by 7.3%.”

I remember reading that this made it into Swift, but cannot find it, so I’m not sure anymore.

And of course, the Swift compiler tries to avoid unnecessary refcount updates.

On apple hardware, uncontended refcounting (swift or objc) has the same perf as non-atomic refcounting. The cost exists, but it isn't terrible, once there's contention between threads the perf drops through the floor. The real killer is there are a bunch of places where the swift evaluation model means they're forced to ref churn, which comes up 100% typical workloads like the million triangle objects in my swift raytracer, all being hit by numerous threads :D
IME Swift’s refcounting is either incredibly inconsequential or a dealbreaker, with very little in between. They’ve done a very good job of optimizing it to the point where it’s barely measurable even in perf sensitive code… until you hit the scenarios where it completely murders performance and there’s nothing you can do about it.

Hopefully the upcoming ownership functional will help in those cases.