Hacker News new | ask | show | jobs
by flohofwoe 2238 days ago
> This means that it maintains the ability to insert freeing code at appropriate program points, without putting restrictions on how you write your code.

How does the approach in mitten compare to Automatic Reference Counting in Objective-C (and I think Swift too)? From my experience, ARC can still add a surprising amount of memory management overhead to a program and needs a lot of hand-holding to keep that overhead down to an acceptable level (e.g. low single-digit percentage of overall execution time in programs that talk to Obj-C APIs a lot). I would be surprised if a "traditional GC" can do any worse in that regard (maybe reference counting smears the overhead over a wider area, e.g. no obvious spikes, but instead "death by a thousand cuts").

One thing I'd like to see in modern languages is to encourage and simplify working with an (almost) entirely static memory layout, and make manipulations inside this static memory layout safe. This static memory layout doesn't need to be magically derived by the compiler as long as the language offers features to easily describe this memory layout upfront.

A lot of data structures in applications don't need to live in "short-lived" memory regions, but they often do because that's what today's languages either encourage (e.g. when built on the OOP philosophy), or what happens under the hood without much control from the code (e.g. in "reference-heavy" languages like Javascript, Java or C# - or even "modern C++" if you do memory management via smart pointers).

Minimizing data with dynamic lifetime, and maximing data with static lifetime could mean less complexity in the language and runtime (e.g. lifetime tracking by the compiler, or runtime memory management mechanisms like refcounting or GCs).

5 comments

"How does the approach in mitten compare to Automatic Reference Counting in Objective-C (and I think Swift too)? From my experience, ARC can still add a surprising amount of memory management overhead to a program and needs a lot of hand-holding to keep that overhead down to an acceptable level (e.g. low single-digit percentage of overall execution time in programs that talk to Obj-C APIs a lot). I would be surprised if a "traditional GC" can do any worse in that regard (maybe reference counting smears the overhead over a wider area, e.g. no obvious spikes, but instead "death by a thousand cuts")."

The reference counts have to be incremented when a new reference is made and decremented when one is deleted; freeing the memory when the count goes to zero. (This activity is cache- and atomicity-unfriendly (in the presence of threads).) A sufficiently smart compiler can omit many if not most of the count activity, but this kind of static analysis promises to remove all of it.

Further, reference counting has difficulty with circular references as the counts never go to zero. This should also be able to handle that.

Both this and reference counting are likely victims of the "death by a thousand cuts" you mention, as well as "drop the last pointer to a large structure and wait for a long time while the pieces are deleted"---the reference counting equivalent of a stop-the-world-and-trace garbage collection.

”This activity is cache- and atomicity-unfriendly (in the presence of threads).“

Indeed it is. https://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf states reference counting takes 42% of execution time in client programs and 15% in server programs.

Luckily, they also present amore cache-friendly variation on reference counting that halves that overhead.

They modified the Swift compiler, so I think there’s a decent chance we’ll see this added to Swift.

Swift also, I think, is somewhat designed around the inefficiency of reference counting by promoting the use of structs for small objects (structs in Swift are value objects and not reference-counted in the language).

From what I understood, it’s not reference counting but trying to determine at compile time when to drop using data flow analysis to come up with an approximation of the liveness.

I had a thought sometimes back, can compilers do a profile run to get information about the liveness of objects it couldn’t determine statically by dumping gc info?

> get information about the liveness of objects it couldn’t determine statically by dumping gc info?

Well it may be possible to optimize via profiling (this is what PGO is), but this of course wouldn’t be a static analysis, so it would just allow some optimizations on dynamic access.

In theory you could use profiling as part of the implementation of a theorem prover, but I haven’t seen any examples where that is more effective then the conventional methods of data flow analysis. And in this case, it would still be static analysis.

Most programs are complex, have lots of branching, so this wouldn't work.
From the ASAP dissertation (https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-908.pdf):

> Reference counting, like asap, is a safe, synchronous memory management strategy. However, rc approximates waste by unreachability which is less timely than asap’s approximation by Access.

I think a more careful reading of the work is required to distinguish the precise meanings of "unreachability" and "access" in this context.

As I understand it:

- unreachability means there's no live reference to that object. - access means somebody is going to use this object again. If you have a live reference but won't ever use it again it's a kind of leak (that's how GCed languages can leak memory without any manual management).

> One thing I'd like to see in modern languages is to encourage and simplify working with an (almost) entirely static memory layout, and make manipulations inside this static memory layout safe.

This sounds interesting! What do you mean by (almost) static memory layout? Fixed sizes for everything in a contiguous region, or multiple growing arrays, or something else entirely?

In recent time I have written a few programs in a way that uses only realloc inside dynamic arrays for memory management and never frees. This leads to a big semi-global struct holding all the dynamic arrays, which I am reminded of. (Local functions can then take those arrays, use them, and give them back once they return.)

I (mostly) returned to C a little while ago, and for smaller things I sometimes create the entire application state as a single, big struct that's made of many smaller nested structs, maybe with one or very few "layers" of dynamically allocated data dangling off from the static "root structure" (but only when really needed).

A very simple example is an all-in-one application data structure like this:

https://github.com/floooh/v6502r/blob/1d2b79ac11d7828b2722b5...

This very simple approach has some downsides of course, mostly because C doesn't help much to solve some problems like a more specialized language could (but on the other hand, it also doesn't get in the way much):

- Every part of the program sees and is allowed to change everything, so it would be nice to have a simple syntax for fine-grained visibility and access rules (but not at all like C++ public/private, more like compile-time read-only and read-write access tokens).

- Not much compile- and runtime-protection from code scribbling over neighboring nested structs.

- Not much flexibility for dynamic arrays. It would be good to have 3 flavors: (1) compile-time max capacity which can be completely embedded, (2) a runtime max capacity array, which is allocated once but can never grow, and (3) a fully dynamic array which can grow (but maybe never shrink?). Such dynamic arrays should never change their base address, so that memory locations remain stable.

- It's not well suited for bigger programs built from many modules. It should be possible to have highly modular program code, but still end up with a single monolithic "root data layout".

One great side effect of this approach is that it feels completely natural to not do dynamic memory allocation all over the place (and one of the good features of C is that memory allocation is always very obvious - and thus easy to avoid).

Isn’t that just a variant on organizing your globals well to make using lots of globals manageable?

That’s what historically was done in languages such as FORTRAN and COBOL (both of which had compile-time memory management, but managed to do that by completely dropping memory allocation at runtime)

And yes, that meant dropping all dynamic memory allocation, too. If your wanted to run your FORTRAN program on a larger data set, you replaced the cards defining the dimensions of your arrays and recompiled.

Most likely! I never wrote FORTRAN or COBOL, but instead was too heavily influenced by the OOP hype of the 90s, and I think this hype still heavily lingers on even in the current "post-OOP" world. It feels like memory management is still heavily stuck in the "every small thing in a program needs its own lifetime" mindset.

Remembering how I did "memory management" in my early 8-bit assembler programs (e.g. not at all, just figure out what's needed upfront and assign every single data item its fixed address) was when I realized that dynamic allocation actually isn't all that important as I assumed the whole time.

But I'd like a "modern approach" and modern tooling for that very old idea :)

I reached a similar conclusion. Dynamic memory is seductive for the task of indefinite scaling, but a practical system always encounters bottlenecks that aren't along the memory management axis, and in the meantime your code is much harder to verify.

NB: Historically, game engines have tended towards object pooling at runtime without any dynamic allocations. In that case there is a defined limit to what a scene will accommodate, and the object counts often simply reflect the other performance bottlenecks involved.

GC is still nice, but mostly in the sense of stitching together the most dynamic elements of the system. You don't want to have to trace a ton of stuff, and that also leads in the direction of flattening the data and making it more manual and static as the type system allows.

> It's not well suited for bigger programs built from many modules. It should be possible to have highly modular program code, but still end up with a single monolithic "root data layout".

You can still have modules per file, where the globals can be thought as members of a virtual root. And the globals can be either shared or private to the module.

I have seen very complex programs done like that. There are some advantages like proving the program does not run out of memory, but the problem is that you end up with code that is harder to reuse, harder to test, etc. as soon as you start sharing state.

But yeah, it is sometimes done like that.

Reference counting happens at runtime, this happens at compile time.
But with ARC the compiler also does compile-time tracking to figure out when object references are shared and unshared, and based on that analysis inserts retain/release calls at the right points in the program (or more importantly: avoids the refcount overhead completely when it is not needed - e.g. when ownership is moved, not shared).

If mitten can do complete compile-time analysis also for all sorts of shared references and thus can avoid refcounting completely at all times then this would indeed be a nice improvement.

I didn't know that some reference-counting languages optimize away some cases of runtime counting by attempting to track ownership, though it makes sense. But I think when people say "reference counting" they mean the naiive approach. Even Rust has reference-counting structures, if you need them, and they actually expand what you're allowed to do with those values because you're partially stepping outside of the ownership system (at the cost of runtime performance).
It's pretty easy to optimise away some reference counting operations. For example if you allocate something on the heap that is not returned from the function, nor passed to any other function, or captured in a closure, then you know it will be dead at the end of the function, so you don't need to emit reference counting operations for it.
Yeah. I guess Rust's secret sauce is just handling the long tail of much harder cases
Probably yeah. One thing is that it helps to have a statically typed language to do these optimisations, since you can used type-based analysis for them. So dynamically typed languages like Lisp or Python are going to have a harder time doing them.