Hacker News new | ask | show | jobs
by boredprograming 1831 days ago
One day, Rust needs a GC. Reference counting is a just crappy GC. Modern GC can perform better than this so Rust is actually hurting its own performance by not having one.

A good GC would make heavily concurrent apps much easier to build with Rust. And would have better performance than the typical Arc Mutex objects passed around right now

3 comments

Tracing GC is troublesome for any non-memory resource, such as network connection or file handle, due to its untimely release, but otherwise I actually agree: reference counting is a GC mechanism—not a very good one, but it's the only one I'm aware of that works both for memory and resources.

I would enjoy someone test a model where the type system guarantees (or at least lets you detect the situation) that you cannot store such non-memory objects behind a traced GC node (these would include plain memory objects that need to be registered/unregistered precisely).

It might be that it would be needlessly annoying to use just compared to just RC. Or maybe it would be best of both worlds?

Not when the language also supports value types and region allocators (e.g. IDispose in .NET).

You can even turn it into RAII proper, by turning into a compilation error not handling those interfaces properly.

Again with .NET, there are SafeHandles as well, alongside the MarshalInterop APIs.

This is nothing new actually, Mesa/Cedar for Xerox PARC used reference counting with a cycle collector, while other descendent languages down to Modula-3 and Active Oberon always combined value types, tracing GC and C++ like resource handling capabilities.

Oh Common Lisp also has similar capabilities, specially the ZetaLisp predecessor from Lisp Machines.

Then Eiffel not only had this, it was also probably the first Algol like language to support non nullable references.

Sadly they decided to ignore all of this in Java, and then its world domination kind of made everyone else ignore it as well.

Thankfully even Java is improving their story in this regard, while languages like D, Nim and yes .NET kind of show what was already available for several decades.

I must be missing something. How is it possible to precisely collect a resource with tracing GC? And if you need to update counters when you make duplicates of object references, you are not using a tracing GC where the benefits are the cheap duplication of object references, cheap allocations and cheap (batched) releases, but the downside is not being able to precisely and automatically do it when the value is available for collection.

Seems to me it is impossible to have both automatic precise release of a resources and collection-based GC?

As I understand it, even the documentation for IDisposable in .NET says as much at https://docs.microsoft.com/en-us/dotnet/api/system.idisposab...:

> The primary use of this interface is to release unmanaged resources. The garbage collector automatically releases the memory allocated to a managed object when that object is no longer used. However, it is not possible to predict when garbage collection will occur. Furthermore, the garbage collector has no knowledge of unmanaged resources such as window handles, or open files and streams.

> Use the Dispose method of this interface to explicitly release unmanaged resources in conjunction with the garbage collector. The consumer of an object can call this method when the object is no longer needed.

So this is the interface you can use to explicitly release a resource, because the GC gets around to it only later at some unspecified time.

About SafeHandle it says at https://docs.microsoft.com/en-us/dotnet/api/system.runtime.i...:

> The SafeHandle class provides critical finalization of handle resources, preventing handles from being reclaimed prematurely by garbage collection and from being recycled by Windows to reference unintended unmanaged objects.

Doesn't seem it's at all helpful for automatic precise release of resources.

> the benefits are the cheap duplication of object references, cheap allocations and cheap (batched) releases, but the downside is not being able to precisely and automatically do it when the value is available for collection.

Note that you don't need GC to reap these benefits, if desired. You can allocate an arena and do secondary allocations inside it, then deallocate everything in a single operation. Arena deallocation is not timely or precise, but it does happen deterministically.

True, but GC gives those benefits automatically, compared to a naive program doing e.g. RC-based memory management.

And there is of course the question of safety; should you release an arena too early, you may have introduced a bug. Worse: it might not crash immediately.

There is actually some work for doing arena management automatically, called region inference: http://www.mlton.org/Regions

But the way I see it, it's just a way to make memory management even more efficient; it's not about precise release of resources, and indeed not all programs can be expressed so that releases can happen only in batches of an arena (assuming those arenas themselves aren't dynamically managed, which certainly is a valid strategy as well, but manual).

> should you release an arena too early, you may have introduced a bug.

A memory safe programmming language will detect any such bugs and reject the program. This is not hard, it's a clean application of existing lifetime checks.

You aren't reading it properly, the documentation you are reading is for the case you leave the work to the GC, you can take it yourself C++ RAII style:

   {
      using my_socket = new NetworkSocket()

   }

   // my_socket no longer exists when code arrives here

Or even better if NetworkSocket is a struct, it gets stack allocated, zero GC.
So how about this then:

    {
      using my_socket = new NetworkSocket();
      my_socket.write("Started");
      register_callback(() => my_socket.write("Finished"));
    }
This is the case what RC solves well and tracing GC doesn't solve at all, regardless of the number of interfaces you implement. It is easy to find yourself in this situation given how much callbacks are used in modern codebases.

    NetworkComponent foo = new NetworkComponent();

    {
       using my_socket = new NetworkSocket();
       foo.socket = my_socket;
    }

    foo.do_sth_with_socket(); // oops, runtime failure, socket closed
Trying to be clever?

Here is your Rust version, enjoy.

    use std::io::{self};

    struct NetworkComponent {
      socket : NetworkSocket
    }

    impl NetworkComponent {
        fn new() -> NetworkComponent {
            println!("Creating NetworkComponent");
            NetworkComponent {
                socket : NetworkSocket{}
            }
        }
        
        fn do_sth_with_socket(&self) {
            
        }
    }

    impl Drop for NetworkComponent {
        fn drop(&mut self) {
            println!("Dropping NetworkComponent");
        }
    }    


    struct NetworkSocket {
        
    }

    impl Drop for NetworkSocket {
        fn drop(&mut self) {
            println!("Dropping NetworkSocket");
        }
    }  

    fn main() -> io::Result<()> {
        let mut foo = NetworkComponent::new();
        
        {
            let socket = NetworkSocket{};
            foo.socket = socket;
        }
        
        foo.do_sth_with_socket(); // oops, runtime failure, socket closed
        
        Ok(())
    }
https://play.rust-lang.org/?version=stable&mode=debug&editio...
I would need some data on that but I have to say that it always makes me laugh when people only take about the GC in threads about D. It's so good for productivity. I don't really like it but I can't describe just how much of a non-issue it is for us (The company I work for)
Tracing GC has poor memory performance because it has to access rarely used or swapped out pages to scan them for pointers. And of course, the peak memory use is much higher since it doesn't free everything as soon as possible.

There may be advantages if you can use it to add compaction, but I don't think you need a GC to do that necessarily.

Actually it is the other way around.

https://github.com/ixy-languages/ixy-languages

No wonder that M1 has specific architecture optimizations that help streamline ARC boilerplate code, while Swift 5.5 will bring more aggressive optimizations (disabled by default, because application can crash if weak/owned references are annotated improperly => WWDC 2021 talk)

This isn't representative of application code and there isn't even any mention of the metrics I mentioned…

> No wonder that M1 has specific architecture optimizations that help streamline ARC boilerplate code

No it doesn't. I told you it didn't the last time you said this.

> This isn't representative of application code and there isn't even any mention of the metrics I mentioned…

Yeah, that is the usual answer when benchmarks prove how much urban myth reference counting performance is actually like.

> No it doesn't. I told you it didn't the last time you said this.

Did you?

There is more important stuff in life to store on my brain than a list of who replies to me on hacker news.

Anyway,

https://github.com/apple/swift/blob/main/stdlib/public/Swift...

https://twitter.com/ErrataRob/status/1331735383193903104

> Yeah, that is the usual answer when benchmarks prove how much urban myth reference counting performance is actually like.

CPU/wall time benchmarks are not that relevant to system performance (seriously!) because second-order effects matter more. But if you had peak memory and page demand graphs that would matter.

For a network driver I don't know if it'd really look any different though. That's mostly arena allocations.

> https://twitter.com/ErrataRob/status/1331735383193903104

The fast atomics and JavaScript instructions do exist but aren't "special", they're just part of the ARM ISA.

Apple's atomics as of recently are almost magically fast, though.
Thanks for sharing these links! Super interesting. I do have a question though. The ixy benchmarks seem to imply that RC is generally slower then GC (go and C# are much faster then swift and are only outdone by languages with manual memory management).

However in the tweet thread you shared, the poster said

> all that reference counting overhead (already more efficient than garbage collection) gets dropped in half.

Implying that reference counting is actually more efficient. I don't know how to rectify these two observations. Do you have any insights?

The observation is done by point of view of Swift developers.

The only reason why Swift has reference counting was historical.

Objective-C GC implementation failed, because it was very hard to mix frameworks compiled with and without GC enabled, alongside the usual issues of C memory semantics.

https://developer.apple.com/library/archive/documentation/Co...

Check "Inapplicable Patterns" section.

So Apple did the right design decision, instead of trying to fix tracing GC in such environment, just like Microsoft does in COM, they looked into Cocoa [retain/release] pattern, automated that, and in a marketing swoop sold that solution as ARC.

Swift as Objective-C replacement, naturally had to build on top of ARC as means to keep compatibility with Objective-C runtime without additional overhead (check RCW/CCW for how .NET GC deals with COM).

Here is a paper about Swift performance,

http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf

> As shown in the figure, performing RC operations takes on average 42% of the execution time in client programs, and 15% in server programs. The average across all programs can be shown to be 32%. The Swift compiler does implement optimization techniques to reduce the number of RC operations similar to those described in Section 2.2.2. Without them, the overhead would be higher. The RC overhead is lower in server programs than in client programs. This is because server programs spend relatively less time in Swift code and RC operations; they spend relatively more time in runtime functions for networking and I/O written in C++.

It makes technical sense that Swift uses reference counting, as explained above, but it isn't due to performance, it just sells better than explaining it was due to Objective-C inherited C memory model, which besides the memory corruption problems, it doesn't allow for anything better than a conservative garbage collector, with very bad performance.

https://hboehm.info/gc/