Hacker News new | ask | show | jobs
by throwaway_pdp09 2021 days ago
Pretty sure no efficent run-time algo for general cycle detection exists. It has (AFAIK) been a big area in garbage collection for a long while, so smart people have been all over it.

Not sure if one exists for compile-time either.

2 comments

I don’t think so, either, but the only time a cycle can be introduced is if one modifies an object to make it point to a younger (created later in time) object.

In some languages, that’s relatively rare because modifying objects after they are created is rare; immutable objects can be part of cycles, but they can’t be the cause of cycles. That may be enough to make the overhead of cycle detection small.

I guess one could keep track of those ‘dirty’ links, and use them to avoid having to detect cycles in most cases.

It might be enough to add a single bit to each object-to-object pointer indicating “this reference was set after construction of the object” (could be an unused bit in a virtual address, or the least significant bit, so it wouldn’t introduce memory overhead).

With such bits, at first sight, one would only have to check for cycles when P references a _possibly_ younger Q, and destruction of some object (indirectly) decreases P’s ref count to 1. However, things probably would get hairy when objects may have many outgoing and/or incoming pointers (consider a set of 3 objects each referencing the other two).

I’ll go and read the referenced paper (https://researcher.watson.ibm.com/researcher/files/us-bacon/...) to see how it handles this.

How do GCs collect cycles then? I'm pretty sure most modern GCs handle cycles effortlessly.
It depends on what you call ‘effortlessly’.

And, technically, GCs collect non-garbage, and throw away what remains. They don’t care whether what they throw away contains cycles or not, but have to take care that they don’t keep running in circles when they follow links between objects while collecting the non-garbage.

Graph scans are trivial to implement though.
Would you consider full graph scan efficient? Would you like a full graph scan after every allocation, millions of times a second?

I don't understand why you said that.

The simplest way to do this is to use the low bit of the address (so long as you align your memory to at least 2 bytes this is fine as all addresses will end in 0) as a marker for whether you've already visited that pointer.

See https://en.wikipedia.org/wiki/Tracing_garbage_collection#Na%...

And yes, you're correct that GCs handle cycles without issue (that's a large part of the reason to use tracing GC over other memory management strategies like ObjC's auto-reference counting).

Don't know if any GCs use it, but if modifying the pointers is not possible then one alternative is Floyd's cycle-finding algorithm[1] which uses constant storage (two pointers).

[1]: https://en.wikipedia.org/wiki/Cycle_detection#Floyd.27s_Tort...

"Finding cycles" is not really the problem a GC is trying to solve. It's trying to find all of the non-reachable data, across a possibly cyclic graph. It just has to ensure that it doesn't loop infinitely while traversing that graph.
And that would be the reminder not to post before breakfast :)

Still, fun algorithm.

I purposefully used the word 'efficient' in my original post. Of course you can find cycles many ways, but floyd's (or similar) are not acceptably fast and never can be in general.
Swift leaves it in the hand of programmers and says "use weak references".

Otherwise traditional GC use a copying mechanism, what doesn't end up copied is dead.

The old reference counting GC in Nim had that extra copying pass when a type could have cycles.

That said, it's very easy to tell if a type can have cycles or not at compile-time, you just need to check if it refers to itself i.e. does the Node type has a Node field (or a field that can recursively contain a Node field).

> easy to tell if a type can have cycles or not at compile-time, you just need to check if it refers to itself

Certainly useful at times as I suppose it may be used to ensure you don't have cycles (I imagine that could be useful for resource management) but 'can have cycles' is definitely not 'does have cycles'. It's the latter that's important here.

And the ``{.acyclic.}`` hint applies to the later.

In the future we might have formal verification helping as well: https://nim-lang.org/docs/drnim.html

Fair question, was unclear.

Cycle detection at point of creation (when a pointer is set). You can do a full scan of data structure graph after every malloc but that's equivalent to your GC, and clearly horribly, unacceptably expensive.