Hacker News new | ask | show | jobs
by cm2187 1966 days ago
But gc.collect is quite expensive. I think the parent question was, I have a large object in memory, I want to delete it now so that I can load another large object without running into memory limits. But I don't want to call gc.collect which will stop my whole application, interfer with the garbage collector heuristics, and do all sort of unecessay steps.

I had instances where I knew only one of these large structures could fit in memory and had to call gc.collect before allocating a new one, as I would get an outofmemory exception before the garbage collector would kick in by itself.

You can do that with unmanaged objects but it doesn't look like you can with managed objects (other than gc.collect which I saw on other videos is not recommended by microsoft).

3 comments

There's an inherent issue with doing that while still being safe: what if there's still a reference to your "big object" somewhere? The only way for the runtime to know for certain it's safe to delete the object is to effectively run the GC anyway. The alternative is that the object gets deleted without any checks and any references to it will now (probably) cause a crash - and it'll be a hard native crash rather than a .net exception since it's outright accessing invalid memory rather than just a "managed" null pointer.

There's probably something to be said for allowing such a thing in explicitly "unsafe" code, but not in the normal runtime. In fact, it might even be possible now by leveraging some of the existing unsafe bits in .net.

True but all it takes to fix this is to add if on the reference count of the object. You only need a full scan to handle circular references. You could have a gc.collect(obj) which will collect this object and all of its dependencies provided their reference count has gone to zero. And otherwise do nothing until the next full garbage collection.
There are GCs that take this approach (refcount + sweeps to break cycles). It has tradeoffs however.

- Extra space for every object to have a refcount

- Extra refcount bookkeeping every time you (re)assign a reference (possibly triggering cache thrashing/false sharing in some multithreaded scenarios), xchgs instead of movs, etc.

- Pointless if you're using bump allocators (they can't reuse the 'freed' memory until the next GC cycle compacts memory anyways), so you're forced to use more complicated allocator designs if you're to reuse said memory.

Cycles mean it still doesn't give you 100% deterministic object destruction either, so you want extra mechanisms for disposing unmanaged resources at controlled times ala IDisposable anyways.

There's no reference count on the object. That's not how garbage collection works in .NET, or Java for that matter.
They don't define which algorithms are to be used, and at least in what concerns Java, there are some implementations that experimented with reference counting based GCs.

https://www.microsoft.com/en-us/research/wp-content/uploads/...

But to get the reference count you’d need to run the entire mark phase.
If you have large objects which you know you want to deallocate, the easiest way to speed them up is to effectively do manual allocation on top of GC.

When you're done with an object (it will almost certainly be an array), push it onto a stack; when you want to allocate, try and pop it off first before allocating fresh. Use stack per thread or locking as appropriate; use multiple stacks with bucketing by size if there's a lot of variance. Use suballocation to reduce reallocating - e.g. allocate 1MB, 2MB, 4MB and so on arrays and keep track of length separately via a slice (array segment).

(ArrayPool in .net encapsulates most of this for you these days, but it's a thing I implemented myself back in .net 2 days.)

> I had instances where I knew only one of these large structures could fit in memory and had to call gc.collect before allocating a new one, as I would get an outofmemory exception before the garbage collector would kick in by itself.

That's a GC bug surely?