Hacker News new | ask | show | jobs
by kps 1148 days ago
Detecting a cycle in a directed graph is useful (in certain domains I've worked in), but doesn't have a cute trick answer.

[Edit: I in mind ‘Detecting the cycles’ rather than ‘Detecting a cycle’ but wrote the wrong thing. Mea culpa.]

Your actual job will be more like implementing a maximally performant directed graph in safe Rust. (Not really. It won't be that interesting.)

1 comments

> Detecting a cycle in a directed graph is useful (in certain domains I've worked in), but doesn't have a cute trick answer.

Doesn't the standard cute trick answer for a linked list (tortoise and hare) work fine? Do two parallel breadth-first traversals of the graph starting at the root node, one at twice the speed of the other. (If there's multiple root nodes, introduce a unique super-root that points to the roots.) If there's at least one cyclic path, both iterations will enter the same cyclic path because iteration order is the same, at which point it reduces to the linked list problem.

Yes, I phrased my comment incorrectly; I had in mind identifying (all) the cycles, and that for a linked list they're the same thing. For a DAG it doesn't generalize because you can't find more cycles without storage to exclude the earlier ones.
Fair enough. I imagine a simple (but O(cycles) space, not O(1)) way to do this is to still tortoise-and-hare, and on detecting a cycle, break the link behind you (by recording it in a table) and continue the BFS. If O(cycles) = O(n) this isn't terribly interesting compared to other approaches, but if O(cycles) < O(n) it might be helpful.
Actually, now that I spend more than ten seconds thinking about this, I think this requires O(1) space; you only need to remember the single most recent link behind you to avoid following it again. Remembering a single break is sufficient to get you to either the next cycle or (if no more) terminate the BFS; and once you detect the next cycle you can safely remember only /its/ break since BFS will never take you back to the first cycle. Assuming streaming output (e.g. print identification of each cycle as you detect it -- including e.g. printing each node until you see the break point, which will identify each node that is part of the cycle), this is O(n) time and O(1) space, which I believe is optimal.
if you can mutate the graph and mark a node as traversed that's the easiest way. If you can't then save the addresses to a hash set. Both are extra memory but you don't have to deal with "parallel BFS," which honestly is just over complicated imo.

With "parallel bfs" comparing "layers" of traversal has runtime cost that effects the big oh so I think the above solution is better overall even though it feels cheap.

But the whole point of the "cute solution" is to solve with O(1) memory; if you're willing to allow O(n) then the linked list also admits more sane solutions.

I'm not sure what you mean by comparing "layers" of traversal. Tortoise and hare is about provide a termination condition in the case that a cycle exists. In the case that there's no cycle, every algorithm must inspect each node, so is O(n). In both the linked list and the DAG case, if there is at least one cycle, (a) both pointers will enter it in at most O(n) time, and they will match in at most O(n) additional time, so this stays big-oh optimal.

I'd argue the point of the exercise isn't to implement an O(1) solution, but to verify that the applicant understands how the logic they're writing actually executes, and is capable of minimizing complexity.

Whether they arrive at an O(1) solution doesn't matter a bit to me. Most of the time, I don't even care if they phrase it in big-O notation - or have even heard the term!

Given the choice between someone who completed the exercise with an O(1) solution in half the allotted time and someone who never wrote a single line of valid code, I'll choose the person who is able to articulate the problem and potential solutions best every time.

It's far easier to search the web and find a "cute solution" than it is to learn how to think about and communicate complexity.

> But the whole point of the "cute solution" is to solve with O(1) memory; if you're willing to allow O(n) then the linked list also admits more sane solutions.

Yeah but the "parallel BFS" solution is actually slower. Overall people sacrifice memory for speed in these algorithm problems.

Additionally the input is already O(N) so if you count the input as memory the addition of O(2N) doesn't shift it. If you don't count it well you're deliberately removing real world actuality and making the problem harder.

>I'm not sure what you mean by comparing "layers" of traversal. Tortoise and hare is about provide a termination condition in the case that a cycle exists. In the case that there's no cycle, every algorithm must inspect each node, so is O(n). In both the linked list and the DAG case, if there is at least one cycle, (a) both pointers will enter it in at most O(n) time, and they will match in at most O(n) additional time, so this stays big-oh optimal.

It's layer based traversal via BFS. You have to compare the layers produced by the tortoise and the hare. Let's see for a binary tree:

  For traversal: 
  hare: O(N)
  tortoise: O(N)
  There are N total nodes so traversal via bfs or dfs will always be N.
BFS traversal happens in layers of breadth. The height of a full binary tree is log2(N + 1) - 1. The length of the last layer is 2^height. So the amount of nodes in the last layer is (N+1)/2

Because at each stage of traversal from the turtle and the hare we have to compare the "layer" of traversal between the hare and the tortoise to see if they overlap there is a roughly O(((N+1)/2 )^ 2) comparison as we cycle through each node in the layer to see if it exists in the other layer being compared. This reduces to O(N^2) processing time.

So total is O(N^3) because you have O(N) traversal time, and O(N^2) comparison time on each traversal step. You can reduce the layer comparison to O(N) if you save one layer to a hashset and then compare that set to the other layer, but that has a memory cost and only reduces the total big O to O(N^2).

There's an error in my calculations. Apologies.

The comparison happens at every level and NOT every node so, it's O(N) for traversal then when a layer is completed a comparison occurs at a cost of O(N^2). The amount of times this happens is equal to the height of the tree which can be simplified to logN. So total big O is O(N) + O(N^2 * log2(N)).

This simplifies to O(N^2 * log2(N)) which, while better than O(N^3), is still really bad.

I forgot to mention, BFS requires a queue. So if the graph is size N and with one root node and all the rest of the nodes as children that queue will go up to size N. So your solution doesn't actually get away from the memory issue at all.