Hacker News new | ask | show | jobs
by _gabe_ 1415 days ago
> trigger deallocation of a large object graph and it's not always clear by just looking at a code what will happen

If you can't understand what's happening when an object gets freed, it may be a sign that your code is too tightly coupled and/or becoming spaghetti. I've found that the more graph-like my data structures become, the more inadvertent complexity I'm adding. That's the whole reason we talk about data normalization, is to avoid these types of couplings.

1 comments

Come on, object graphs are completely dynamic, noone can say where will “a variable go out of scope”, unless we literally have a hello world.

Do you honestly claim that you know when deallocations happen in any codebase full of conditionals depending on outside effects (user input, network, etc)?

I mean... people program large developments in Rust with pretty minimal use of Arc. It's clearly possible to structure many programs this way (though they do not necessarily look like programs in other languages). It's important not to make extreme and definitive statements about what's realistic that are contradicted by a bunch of existing programs... in any case this has little bearing on the reference counting vs tracing GC performance thing. It also might be instructive to look at how Rust frameworks deal with your UI example: mostly, by keeping a vector of UI widgets and reusing the allocation over and over. Therefore, you shouldn't see a significant pause there (you could probably argue the pause comes when resizing the vector technically, but it is generally efficient enough that you're not going to notice it unless the vector is very very large).
Statically adding code at the end of scope-leaves is different than knowing when the deallocation will happen.

The borrow checker doesn’t know when will the scope end, it only knows that however it happens upholds the invariants it cares about. You might call two entirely different code path inside a method and rust will only execute the dealloc logic at the end at a non-deterministic time. But maybe we just use different terminology here.

Fun fact: Rust actually had a proposal at one point to execute drops entirely statically, with no runtime flags on the stack. It was decided against because people thought it would be too confusing as it would be hard to tell when the destructor would run, but for purely memory related destructors it would probably be acceptable.

I do see what you're trying to get at, I think, but it's also worth noting that the use of stuff like arenas and vectors to absorb the cost of the repeated reallocations goes a long way here towards making deallocation times predictable in practice (if not in theory). It is certainly the case that you can mostly reduce the deallocation overhead to ~ zero for any particular part of your Rust program without that much effort, unless you are writing an interpreter for a different language that expects GC semantics (at least, that's been my experience).

> Do you honestly claim that you know when deallocations happen in any codebase full of conditionals depending on outside effects (user input, network, etc)?

Yes. C programs have been doing this for over 40 years now. A leak free C program has an equivalent free for every malloc, which means they know exactly when everything gets allocated and freed.

That just means that every allocation has a pair that frees it - that’s different from knowing how many allocations happen, when and when does the corresponding free happen.

For a simple example, take a text editor (sure, you would likely allocate a much bigger buffer in practice) that allocates for each line of text an object, and adds the buffer’s pointer to a list to be freed - this freeing happens when the user closes the open text file window. While you do know that every allocation will be freed and know their relative order, you don’t know anything more specific - will the user open multiple such files first, close them in some random order, etc.

It's not different though. You asked:

> Do you honestly claim that you know when deallocations happen in any codebase full of conditionals depending on outside effects (user input, network, etc)?

And the answer to that is yes. You even admitted that here:

> While you do know that every allocation will be freed and know their relative order

This is a lot more specific than "the GC will free this memory at some indeterminate point that it deems acceptable".

Additionally, in your specific example:

> this freeing happens when the user closes the open text file window

So you can plan for that. You can pop up a saving screen if you run tests and realize that the deallocations take a bit of time. With GC, it's luck of the draw. I'm speaking from experience.

I wrote a tool that used Roslyn to do some transpiling of a custom format in our company. It was very important that it ran fast, since this algorithm was going to be run in a time sensitive situation. And it had to free it's memory as soon as it was done using it, since it was hot swapping the DLLs and I needed to make sure I wasn't getting name collisions from DLLs that were waiting for the GC to run to fully unload the old DLLs. I tried so many different ways to tell C# GC to collect the object tree that I knew was no longer necessary, but it was seemingly impossible.

Microsoft even has a page dedicated to debugging why an assembly won't unload and it reads:

> The difficult case is when the root is a static variable or a GC handle.[0]

Now you may say that this while problem only arises because of my weird specific use case, which is true but I couldn't change my requirements since those were hard requirements from my company. This could all easily be solved if a GC allowed you to define the concept of ownership. I had references hanging around that didn't matter because the object holding the references didn't own that memory.

All that to say, yes you can know exactly when you're memory will be deallocated in a language like C. In a language like C#, your left to the whims of the GC which can be a deal breaker in lots of cases.

What really gets me too, is languages like C# end up creating DI frameworks with the "novel" concept of lifetime requirements to make sure object lifetimes are properly scoped. What the heck? It's got a GC. Why did they go through all that trouble if the GC just cleans it up for you? If I have to think about object lifetimes, I may as well switch to a language that makes that explicit rather than a language that obfuscated it as much as possible.

[0]: https://docs.microsoft.com/en-us/dotnet/standard/assembly/un...

Well, you can always go a layer below and use a bytebuffer (or its C# equivalent). You can drop it at the end of the task in an instant.

But I don’t think that giving up the comfort/performance of a good GC is a good tradeoff in all the other cases. The same way Rust et alia can opt into some form of GC with (A)RC, GC languages can have escape hatches as well.

In most cases, yes, you should be able tell when deallocations are going to happen once you know the inputs.
But what if my application is say, a diagramming GUI where the user can create many nested items. When they delete a million items by removing a top level item, how are you going to avoid a pause if using single threaded synchronous RC? Per object determinism doesn't mean systemic determinism on a dynamic graph.
You're not going to avoid it. But you will know that it'll happen at that exact moment.

Whether that is actually important or not depends on the use case. Personally, I think that GC is plenty good enough for most GUI apps other than games, and allows for non-contorted modelling of said GUI (e.g. with backreferences where they make sense).

You say, "you will know that it'll happen at that exact moment". I'm curious what you mean by this.

Do you mean that the user will know? Well sure, that's the pain point to avoid in this case. Anyone who has tried to quit certain versions of various browsers after a long session with many tabs, etc. will know this pain when closing a window. Server side applications can have similar issues.

Or do you mean the code will "know"? That is, the code will need to predict, at runtime, that a code path will be expensive and choose a memory release strategy based on some criteria?

Or do you mean the designer of the code will know, and avoid RC before implementing?

Honest question, I'd like to understand your perspective. Thanks.

This was specifically a response to:

> you know when deallocations happen in any codebase full of conditionals depending on outside effects

What I'm saying is that the author of the code, and anyone else who can read and understand it, will know that, if the user does X, Y, and Z, it'll trigger a deeply nested release of an object graph that'll cause a period of non-responsiveness visible to the user.

Whether the author will consider this acceptable or not is a different question.