Hacker News new | ask | show | jobs
by _flux 1831 days ago
Tracing GC is troublesome for any non-memory resource, such as network connection or file handle, due to its untimely release, but otherwise I actually agree: reference counting is a GC mechanism—not a very good one, but it's the only one I'm aware of that works both for memory and resources.

I would enjoy someone test a model where the type system guarantees (or at least lets you detect the situation) that you cannot store such non-memory objects behind a traced GC node (these would include plain memory objects that need to be registered/unregistered precisely).

It might be that it would be needlessly annoying to use just compared to just RC. Or maybe it would be best of both worlds?

1 comments

Not when the language also supports value types and region allocators (e.g. IDispose in .NET).

You can even turn it into RAII proper, by turning into a compilation error not handling those interfaces properly.

Again with .NET, there are SafeHandles as well, alongside the MarshalInterop APIs.

This is nothing new actually, Mesa/Cedar for Xerox PARC used reference counting with a cycle collector, while other descendent languages down to Modula-3 and Active Oberon always combined value types, tracing GC and C++ like resource handling capabilities.

Oh Common Lisp also has similar capabilities, specially the ZetaLisp predecessor from Lisp Machines.

Then Eiffel not only had this, it was also probably the first Algol like language to support non nullable references.

Sadly they decided to ignore all of this in Java, and then its world domination kind of made everyone else ignore it as well.

Thankfully even Java is improving their story in this regard, while languages like D, Nim and yes .NET kind of show what was already available for several decades.

I must be missing something. How is it possible to precisely collect a resource with tracing GC? And if you need to update counters when you make duplicates of object references, you are not using a tracing GC where the benefits are the cheap duplication of object references, cheap allocations and cheap (batched) releases, but the downside is not being able to precisely and automatically do it when the value is available for collection.

Seems to me it is impossible to have both automatic precise release of a resources and collection-based GC?

As I understand it, even the documentation for IDisposable in .NET says as much at https://docs.microsoft.com/en-us/dotnet/api/system.idisposab...:

> The primary use of this interface is to release unmanaged resources. The garbage collector automatically releases the memory allocated to a managed object when that object is no longer used. However, it is not possible to predict when garbage collection will occur. Furthermore, the garbage collector has no knowledge of unmanaged resources such as window handles, or open files and streams.

> Use the Dispose method of this interface to explicitly release unmanaged resources in conjunction with the garbage collector. The consumer of an object can call this method when the object is no longer needed.

So this is the interface you can use to explicitly release a resource, because the GC gets around to it only later at some unspecified time.

About SafeHandle it says at https://docs.microsoft.com/en-us/dotnet/api/system.runtime.i...:

> The SafeHandle class provides critical finalization of handle resources, preventing handles from being reclaimed prematurely by garbage collection and from being recycled by Windows to reference unintended unmanaged objects.

Doesn't seem it's at all helpful for automatic precise release of resources.

> the benefits are the cheap duplication of object references, cheap allocations and cheap (batched) releases, but the downside is not being able to precisely and automatically do it when the value is available for collection.

Note that you don't need GC to reap these benefits, if desired. You can allocate an arena and do secondary allocations inside it, then deallocate everything in a single operation. Arena deallocation is not timely or precise, but it does happen deterministically.

True, but GC gives those benefits automatically, compared to a naive program doing e.g. RC-based memory management.

And there is of course the question of safety; should you release an arena too early, you may have introduced a bug. Worse: it might not crash immediately.

There is actually some work for doing arena management automatically, called region inference: http://www.mlton.org/Regions

But the way I see it, it's just a way to make memory management even more efficient; it's not about precise release of resources, and indeed not all programs can be expressed so that releases can happen only in batches of an arena (assuming those arenas themselves aren't dynamically managed, which certainly is a valid strategy as well, but manual).

> should you release an arena too early, you may have introduced a bug.

A memory safe programmming language will detect any such bugs and reject the program. This is not hard, it's a clean application of existing lifetime checks.

So are there some languages that do it? I'm sure the devil is in the details.
You aren't reading it properly, the documentation you are reading is for the case you leave the work to the GC, you can take it yourself C++ RAII style:

   {
      using my_socket = new NetworkSocket()

   }

   // my_socket no longer exists when code arrives here

Or even better if NetworkSocket is a struct, it gets stack allocated, zero GC.
So how about this then:

    {
      using my_socket = new NetworkSocket();
      my_socket.write("Started");
      register_callback(() => my_socket.write("Finished"));
    }
This is the case what RC solves well and tracing GC doesn't solve at all, regardless of the number of interfaces you implement. It is easy to find yourself in this situation given how much callbacks are used in modern codebases.

    NetworkComponent foo = new NetworkComponent();

    {
       using my_socket = new NetworkSocket();
       foo.socket = my_socket;
    }

    foo.do_sth_with_socket(); // oops, runtime failure, socket closed
Trying to be clever?

Here is your Rust version, enjoy.

    use std::io::{self};

    struct NetworkComponent {
      socket : NetworkSocket
    }

    impl NetworkComponent {
        fn new() -> NetworkComponent {
            println!("Creating NetworkComponent");
            NetworkComponent {
                socket : NetworkSocket{}
            }
        }
        
        fn do_sth_with_socket(&self) {
            
        }
    }

    impl Drop for NetworkComponent {
        fn drop(&mut self) {
            println!("Dropping NetworkComponent");
        }
    }    


    struct NetworkSocket {
        
    }

    impl Drop for NetworkSocket {
        fn drop(&mut self) {
            println!("Dropping NetworkSocket");
        }
    }  

    fn main() -> io::Result<()> {
        let mut foo = NetworkComponent::new();
        
        {
            let socket = NetworkSocket{};
            foo.socket = socket;
        }
        
        foo.do_sth_with_socket(); // oops, runtime failure, socket closed
        
        Ok(())
    }
https://play.rust-lang.org/?version=stable&mode=debug&editio...
And what did you try to prove here? There is no use after free and no runtime error in this rust code. The socket stays valid since the moment of its creation and for the whole lifetime of the network component. It gets moved out of nested scope properly and gets closed after leaving the outer scope, after dropping the NetworkComponent struct.

The "oops" comment is invalid in your Rust example because the socket is still valid at that point.

Which is totally different than what would happen in C#, where you'd get use-after-free bug (actually use-after-close).

Try with resources is not RAII. It is a lot weaker.