Hacker News new | ask | show | jobs
by barrkel 3042 days ago
The problematic code in the article is, AFAICT, using an object finalizer to free manually allocated memory; such approaches seldom work well, even with precise GCs.

Thread stacks are effectively manually allocated blocks of memory. You create a thread, which allocates the stack, and as long as the thread lives, the stack is kept alive - it's self-sustaining. The thread must die by explicit programmatic action, which in turn will free its allocated block of stack memory.

Using finalizers at all is usually an anti-pattern in a GC world. The presence of finalizers is a very strong hint that the GC is being used to manage resources other than memory, something that GC is a poor fit for, because other resources almost certainly have no necessary correlation with memory pressure; and GCs usually only monitor GC heap memory pressure.

That's not to say that there aren't plenty of edge cases where you can end up with lots of false roots that artificially lengthen object lifetimes with a conservative GC. Putting a thread stack in your cycle of object references and relying on GC pressure to break the cycle isn't a strongly motivating one to my mind, though.

4 comments

Absolutely agree with you about finalizers! However, please note that this "threadReaper" code is from JDK class, so, the problem can appear on every application that just use Timer class.

Of course, there are many other examples of false-roots, but this concrete class caused unexpected OOMs on several applications of our clients, so we made this small sample and used it for sanity checking during implementing precise GC (and then mentioned it in the post).

I take it you are part of the ones behind the article? Did you ever see the papers about a conservative variant of immix, which would be both compacting and conservative?
Yes, I wrote this article.

No, I haven't read those papers before, but after your comment I took a look at one of them.. and it is very interesting! Thanks for mentioning it.

But, if I understood correctly, the excess memory retention is still a problem in conservative immix?

It is mentioned in one of the papers (or possibly the doctoral thesis, which is nice as it gives an overview of the whole chain of improvements) that it can still happen but that it is much better compared to what they benchmarked it against (boehm, iirc). Hard to tell if it delivers what it promises though..
It would be very interesting to run our sample with Timers on this GC. In that paper I found a link to their implementation, but unfortunately it is unavailable.
Link is dead, but code is on Github somewhere. Dunno where I found the link to it but I am away from computer for a few days so I can't really look it up for you.
I didn't know it was from the JDK, interesting.
Your argument against finalizers completely ignores that FFI is a thing. Finalizers can be managing things that are memory, just not memory that the GC allocated because it is coming from a foreign system.

You also have to use them as a safe-guard against programmers that are not used to manually managing scope failing to manually manage scope. In some cases it's also just not a practical expectation, as the language is structured entirely around the idea that you aren't supposed to be manually managing scope. An open file that weaves its way throughout an application as it's a constant read/write backing for something? That can often be non-trivial to figure out when to call close() on it exactly, and by the time you've written such a system you've just re-invented manual reference counting (error prone) or a garbage collector of some sort anyway, and should have just used a finalizer.

Calling finalizers an anti-pattern only works to the extent that you can ban everything not controlled by GC'd world. Which would be great, except that you can't even do "hello world" with that constraint.

> Your argument against finalizers completely ignores that FFI is a thing

Nope. Memory that is indirectly allocated via FFI is not normally[0] accounted for by GC memory pressure and so it should be managed explicitly, not using finalizers. That memory is invisible to the GC. It won't know to collect it. It won't know when the foreign heap has allocated too much, and it won't know to run a more expensive GC collection to try harder when foreign space gets tight.

(If you have a relatively infinite amount of RAM, or your FFI object sizes are a constant factor of the size of their GC world counterparts and you account for this in your GC max heap size, you may get away with it. But these constraints aren't typical.)

> That can often be non-trivial to figure out when to call close() on it exactly, and by the time you've written such a system you've just re-invented manual reference counting (error prone) or a garbage collector of some sort anyway, and should have just used a finalizer.

You're right that it can be hard to figure these things out. But figure them out you must, for any long-lived program using scarce resources, or you're just creating a problem for the future.

Working these things out is not actually rocket science. It's what we did in the days before GC, and it was actually tedious more than difficult, because the best way to do it meant dogmatically following certain idioms when programming, being rigorous in your error handling and consciously aware of ownership semantics as data flowed around the program.

We even developed a bunch of strategies to make it simpler, e.g. arena allocation and stack marker allocation. You can adapt these approaches for deterministic resource disposal in GC environments too (e.g. keep a list of things to dispose, or markers on a stack of things to dispose).

The biggest wins from GC are from two effects: memory safety and expression-oriented programming. Memory safety means you never get dangling pointers or dynamic type errors from reused memory, a major increase in reliability, as well as making certain types of lock-free programming much easier. Expression-oriented programming means you can safely and easily write functions that take complex values and return complex values without thinking too hard about the ownership of these complex values. This in turn lets you program in a functional style that is much harder without GC.

What GC doesn't give you is a world free of non-functional requirements. You still need to know about memory allocation of your big algorithms and program overall; you need to know where big object graphs get rooted for longer periods of time before dying (the middle age problem[1]), and you need to track the ownership of your resources rigorously, or you will run into resource exhaustion and non-deterministic failure modes - some of the worst kinds of failures.

[0] https://docs.microsoft.com/en-us/dotnet/api/system.gc.addmem...

[1] https://blogs.msdn.microsoft.com/ricom/2003/12/04/mid-life-c...

> Nope. Memory that is indirectly allocated via FFI is not normally[0] accounted for by GC memory pressure and so it should be managed explicitly, not using finalizers. That memory is invisible to the GC. It won't know to collect it. It won't know when the foreign heap has allocated too much, and it won't know to run a more expensive GC collection to try harder when foreign space gets tight.

I'd call that a bug in the GC's design and a problem with its particular implementation of finalizers. Not a problem with the concept of finalizers, and indeed not a problem with all GCs/finalizer systems, as you linked. Android's runtime also allows for finalizers to pressure the heap to avoid this problem: https://android.googlesource.com/platform/libcore/+/master/l...

> Working these things out is not actually rocket science. It's what we did in the days before GC

Non-GC languages have facilities to help with this. GC'd languages don't.

The presence of the GC makes this problem worse by design, so you can't just pretend that the GC isn't involved here. It is. It changed the design of the language. It makes manual tracking harder than it otherwise would be. Therefore it is part of the GC's responsibility to help with the problem it caused.

This is right: finalizers are difficult to combine with garbage collection for several reasons. As you say, there's no guarantee that the garbage collector will find the finalizable object in a timely manner. Additionally, if it's a conservative collector, then the finalizable object may be pinned by an ambiguous reference and the collector will be unable to free it. In multi-threaded programs the collector runs asynchronously from the point of view of the code, so if the garbage collector calls the finalizer then the finalization code may not be able to depend on invariants of data structures. It is hard to write correct finalization code under these conditions.

A 2002 technical report by Hans Boehm discusses these difficulties. http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html

By your reasoning, everything is manually allocated. When we call (cons 1 2) in Lisp, that's a manually allocated cons. Most objects come into life due to some "manual" construction in the program!

> The thread must die by explicit programmatic action, which in turn will free its allocated block of stack memory.

Simply, no. A thread can recurse through some function activations and then hit an exit statement which terminates it without destroying those activations. Other threads can have pointers into that thread's stack, so the stack must not be reclaimed until they let go.

For instance, a tread allocates a reply box on the stack and calls some message passing API to send a message, whereby it registers the reply box. That API now has a pointer into the thread's stack. Suppose the thread dies before unregistering the reply box. If that can happen, the stack has to stick around. When the API tries to reply to the thread, and finds that the thread is dead, it can dequeue the reply box at that time; then the stack becomes unreachable. If the stack is reclaimed before then, then this messaging API will be traversing bad pointers through this registered message box.

> because other resources almost certainly have no necessary correlation with memory pressure

This is true and there is a way to regard finalization as decoupled from GC.

I have experience implementing an object system in which finalization is treated similarly to C++ destructors.

Objects can expect to have their finalizers explicitly invoked even when they are still live, long before they have become garbage.

One situation in which that happens is if an exception is thrown during the construction of an object.

I also have scoped reclamation construct with-objects which implements RAII. For instance:

    (with-objects ((foo (new foo ...))
                   (bar (new bar ...))
       ...)
The finalizers of foo and bar are invoked when with-objects terminates. Additionally, if foo throws during its construction, its finalizer is called, and if bar throws during its construction, both are called.

Now if those finalizers are not invoked by the time those objects become garbage, then GC will invoke them. Basically a finalizer is invoked and then removed, so if called early, then GC doesn't call it any more.

Situations in which it's undesirable or infeasible to use with-objects are covered by the call-finalizers API: a one-argument function which takes an object, and calls and removes its finalizer, allowing for completely manual finalization.

Of course, this is basically a logical extension of how with-open-file works in Common Lisp, formalized into the object system, integrated with finalizers.

There is a tradeoff: timely reclamation versus the risk created by the possibility of premature reclamation. It's easy to make objects which can be safely used after their finalizer has been called; the risk isn't that of a dangling pointer that will blow up the show. Still, things can malfunction when objects "hollowed out" by finalization are relied upon to still be functional.

> Other threads can have pointers into that thread's stack, so the stack must not be reclaimed until they let go.

> For instance, a tread allocates a reply box on the stack and calls some message passing API to send a message, whereby it registers the reply box. That API now has a pointer into the thread's stack. Suppose the thread dies before unregistering the reply box. If that can happen, the stack has to stick around.

OMG, can such thing really happen and be correct in some language? The thing is, the stack is most often used as method-local storage for that method's variables and objects that are not escaping its scope and thus can be allocated on stack instead of the heap.

Basically, what happens with stack in a language like C or Java when in thread T some method A calls another method B is the following:

  T.stack.push(return address)
  T.stack.allocateFrame(B)

Then, when B has done everything it wanted to, it clears its frame from the stack and returns by saved address into A. Similarly, when an exception is thrown, it crawls up the stack frame-by-frame clearing them out until it finds the handler.

With that general scheme in mind, I see some contradictions in your example:

1. How exactly can a thread correctly die in such a way that frame of method that allocated the reply box on-stack is not cleaned up from it first?

2. Why the method even allocated shared data structure on its own stack (which it must clear on returning to caller) and not in heap?

3. If it did so because it blocks while waiting for the response to appear in stack-allocated placeholder, then why would anyone consider thread which is actively waiting for something dead and try to reclaim its stack?

> How exactly can a thread correctly die in such a way that frame of method that allocated the reply box on-stack is not cleaned up from it first?

The thread could die incorrectly in that way.

> Why the method even allocated shared data structure on its own stack

Done for efficiency or when it's not desirable to have to check and recover from a failure to allocate such a structure. E.g. DECLARE_WAITQUEUE macro in Linux kernel:

https://elixir.bootlin.com/linux/v4.3/source/include/linux/w...

How blocking is implemented in Linux is that tasks declare wait queue nodes on the stack, register these into a queue, then change themselves to a sleep state and call the scheduler.

> why would anyone consider thread which is actively waiting for something dead and try to reclaim its stack?

What reclaims the stack isn't that "anyone"; it's garbage collection. It's plausible like this. Suppose the messaging API is responsible for dequeuing the waiter. The messaging API walks the queue, depositing replies into the reply boxes and dequeueing (without caring whether the associated threads are alive or dead).

When it dequeues the dead thread's reply box, that stack then becomes unreachable. Now it is eligible for reclamation.