Hacker News new | ask | show | jobs
by vlovich123 440 days ago
Why do you treat memory allocations as a special resource that should have different reasoning about lifetime than something like a file resource, a db handle, etc etc? Sure if you don’t care about how much memory you’re using it solves a small niche of a problem for a lot of overhead (both CPU and memory footprint) but Rust does a really good job of making it easy (and you rarely / never really have to implement Drop).

The context managers and stuff are a crutch and admittance that the tracing GC model is flawed for non memory use cases.

2 comments

Memory is freed as soon as the program closes by the operating system. If you have an open connection to a database they might expect you to talk to them before you close the handle (for example, to differentiate between crash and normal shutdown). Any resource shared between programs might have a protocol that needs to be followed which the operating system might not do for you.

The GC model only cares about memory, so I don't really understand what you mean by "flawed for non memory use cases". It was never designed for anything other than managing memory for you. You wouldn't expect a car to be able to fly, would you?

I personally like the distinction between memory and other resources. If you look hard enough I'm sure you'll find ways to break a model that conflates the two. Similar to this, the "everything is a file" model breaks down for some applications. Sure, a program designed to only work with files might be able to read from a socket/stream, but the behavior and assumptions you can make vary. For example, it is reasonable to assume that a file has a limited size. Imagine you're doing a "read_all" on some video-feed because /dev/camera was the handle you've been given. I'm sure that would blow up the program.

In short, until there is a model that can reasonably explain why memory and other resources can be homogenized into one model without issue, I believe it's best to accept the special treatment.

> to differentiate between crash and normal shutdown

> Any resource shared between programs might have a protocol that needs to be followed which the operating system might not do for you.

Run far and quickly from any software that attempts to do this. It’s a sure fire indication of fragile code. At best such things should be restricted to optimizing performance maybe if it doesn’t risk correctness. But it’s better to just not make any assumptions about reliable delivery indicating graceful shutdown or lack of signal indicating crash.

> In short, until there is a model that can reasonably explain why memory and other resources can be homogenized into one model without issue, I believe it's best to accept the special treatment.

C++ and Rust do provide compelling reasons why memory is no different from other resources. I think the counter is the better position - you have to prove that a memory resource is really different for the purposes of resource management than anything else. You can easily demonstrate why network and files probably should have different APIs (drastically different latencies, different properties to configure etc). That’s why network file systems generally have such poor performance - the as-if pretension sacrifices a lot of performance.

The main argument tracing GC basically makes is that it’s ok to be geeedy and wasteful with RAM because there’s so much of it that retaining it extra is a better tradeoff. Similarly it argues that the cycles taken by GC and random variable length pauses it often generates don’t matter most of the time. The counter is that while it probably doesn’t matter in the P90 case, it does matter when everyone takes this attitude and P95+ latencies (depending on how many services are between you and the user you’d be surprised how many 9s of good latency your services have to achieve for the eyeball user to observe an overall good P90 score).

> For example, it is reasonable to assume that a file has a limited size. Imagine you're doing a "read_all" on some video-feed because /dev/camera was the handle you've been given. I'm sure that would blow up the program.

Right, which is one of many reasons why you shouldn’t ever assume files you’re given are finite length. You could be passed stdin too. Of course you can make simplifications in cases, but that requires domain-specific knowledge, something tracing GCs do not have because they’re agnostic to use case.

Because 99.9% of the time the memory allocation I'm doing is unimportant. I just want some memory to somehow be allocated and cleaned up at some later time and it just does not matter how any of that happens.

Reasoning about lifetimes and such would therefore be severe overkill and would occupy my mental faculties for next to zero benefit.

Cleaning up other resources, like file handles, tends to be more important.

Then stick it in a Box, RC or Arc and forget about it. You only need lifetimes for cases where you want to avoid heap allocations.
Simple reference counting fails as soon as you have a graph that might have loops in it.
Then use “tracing GC as a library” like [1] or [2]. I’m not saying there’s no use for tracing GC ever. I’m saying it shouldn’t be a language level feature and it’s perfectly fine as an opt-in library.

[1] https://github.com/Manishearth/rust-gc

[2] https://github.com/oilpan-gc/cppgc

Bolt-on GCs necessarily have to be conservative about their assumptions, which significantly hinders performance. And if language semantics doesn't account for it, the GC can't properly do things like compacting (or if they can, it requires a lot of manual scaffolding from the user).

It should be a language level feature in a high-level language for the simple reason that in vast majority of high-level code, heap allocations are very common, yet most of them are not tied to any resource that requires manual lifetime management. A good example of that is strings - if you have a tracing GC, the simplest way to handle strings is as a heap-allocated immutable array of bytes, and there's no reason for any code working with strings to ever be concerned about manual memory management. Yet strings are a fairly basic data type in most languages.

That's my point though. You either have graph structures with loops where the "performance" of the tracing GC is probably irrelevant to the work you're doing OR you have graph ownership without loops. The "without loops" is actually significantly more common and the "loops" case actually has solutions even without going all the way to tracing GC. Also, "performance" has many nuances here. When you say "bolt on GC significantly hinders performance", are you talking about how precisely they can reclaim memory / how quickly after being freed? Or are you talking about the pauses the collector has to make or the atomics it injects throughout to do so in a thread-safe manner?

I suspect the benefits of compaction are wildly overstated because AFAIK compaction isn't cache aware and thus the CPU cache thrashes. By comparison, a language like Rust lets you naturally lay things out in a way that the CPU likes.

> if you have a tracing GC, the simplest way to handle strings is as a heap-allocated immutable array of bytes

But what if I want to mutate the string? Now I have to do a heap allocation and can't do things in-place. Memory can be cheap to move but it can also add up substantially vs an in-place solution.