Hacker News new | ask | show | jobs
by herodotus 3134 days ago
One of my all-time favourites is this: suppose you have a linked list that eventually cycles: that is, the link of one of the nodes in the list points to a previous node in the list, but not necessarily the first node. Write a function to compute the cycle length of the cycle. Now if it was just a simple circular list, this would be trivial: set a pointer to a node, and move another pointer a node at a time until the two pointers meet. However, this list has an initial sequence of nodes of unknown length that is not in the cycle. So the trick is really to find a node that is guaranteed to be in the cycle.

A nice way to do this is to start with two pointers to the head to the list, and then advance one by a single node at a time, and the other by two nodes at a time. It is easy to see that they will eventually meet on the cycle. Once that happens, it is easy to count the cycle length.

Now suppose you are running a casino and you are using a random number generator. For any given seed, the generator will eventually cycle, but not necessarily to the seed. You can compute the cycle lengths for a given seed with two variables by analogy with the linked list solution stated above.

9 comments

As others have said, this is actually a, perhaps the best, canonical example of a terrible interview question [1].

[1] https://news.ycombinator.com/item?id=7953725

Totally agree.

If I ask this question (I never would) and I get the "tortoise and the hare" answer, it's much more likely that the candidate already knew the answer rather than deduced it themselves. And I have learned exactly nothing about that candidate, except that they probably looked up common interview questions before coming.

A more reasonable answer to this question would be to store the nodes you've already seen in a hash map, keyed to their position, and stop when you encounter the same node twice. This in turn leads to interesting discussions about list size, memory cost, and more general questions about the properties of a hash map.

Plus, someone should be able to get the hash table brute force answer coming in cold.
Only if you're looking for that specific answer? Seems pretty trivial to write a memory heavy solution with an array or hash which simply asks: "Have I seen this node before? Is it the first node?"
Does that tell you anything interesting about your candidate's ability to think critically and problem solve, though? There are far better questions (and indeed, types of questions) that can cover more ground and be more candidate-friendly.
I liken these questions to a problem with a fancy car. If you have the exact tool it's trivial but if not damn near impossible.

And a horrible interview question. It shows only that the interviewee has heard the question before or not.

Who couldn't figure out the idea behind tortoise and the hare? Maybe not the implementation, but the concept is probably the simplest way of going about it.

I am no programmer, and I managed to implement it not too long ago all on my own. It took no more than 10 minutes from idea to working implementation.

As the linked article in parent post points out, this was an open problem in CS for 12 years before someone came up with the tortoise and hare algorithm. You really expect a candidate to come up with it from the top of their head, in a few minutes, during a stressful interview?
No not the specific solution, but the concept of having to pointers moving at different speeds did be pretty clear no?

It is really not a hard problem. If you are dealing with cirular lists you will end up having to solve it sooner or later. The "open question in CS for 12 years" is probably easier explained by "nobody could be bothered" than "it is hard".

Considering there are limited use cases for pointers moving at different speeds in other fields, no, I wouldn't say it would be easy.

The idea itself, once you hear it, instantly makes sense. But having the ingenuity to thinking of it during a stressful period shouldn't be used as a measure of how good a candidate would be for a position. Especially considering there are hundreds of different ways of approaching such problems.

> but the concept of having to pointers moving at different speeds did be pretty clear no?

> this was an open problem in CS for 12 years before someone came up with the tortoise and hare algorithm.

So, empirically, "no".

I mostly agree with this, however if the job requires a C.S. degree or equivalent and you assume all qualified candidates have been exposed to this, then perhaps it can be a useful question for testing how well you can explain a solution.

Just look at how varied and complicated some of the explanations are for this out there. Some take mathematical approaches, others (me) try to boil it down to a few intuitive sentences... I can see it giving some insight into a candidate.

Unless you're interviewing Dijkstra, it's a ridiculous question to ask. Anyone who's heard it before knows the answer instantly (I knew the answer before I even finished reading the description) and anyone unfamiliar with it will likely flail.
Nah, I wouldn't say everyone would flail but Dijkstra. There are simpler solutions to cycle detection.. maybe not as efficient as Floyd, but you could answer the question. For example you could start with something basic and unoptimized like keep a data structure of all the nodes you walk, and when you hit one that's in the structure then you've encountered a cycle. This is a fair approach to engineering solutions to novel problems -- don't panic, get something working, then optimize where necessary.

And if you already know the best practice, like I said, it's not just about whether you know the answer -- it's how simply you can explain it to a teammate and the theory behind why it works. That's also a useful skill to interview for, especially for more experienced candidates.

I’m not convinced anyone unfamiliar would likely fail. A pretty obvious solution is to just have a list of all nodes you’ve been to, and check if the current node is in that list. Not the most optimal solution, but one I’d imagine a lot of decent programmers would be able to come up with.

Not saying it’s a good interview question, just that most good programmers shoukd be able to come up with some form of answer if they understand linked lists.

>Unless you're interviewing Dijkstra, it's a ridiculous question to ask.

like many ridiculously looking interview questions, they are actually a filter. For this particular types of questions young people tend to quickly solve them. In 1987, in 9th grade, during one of the first computer classes, never before seeing or touching programming (previous years of school in a small town in USSR), I myself came up with Dijkstra algorithm in less than 10 minutes when we were assigned task of finding shortest path. I remember the teacher looking impressed :) These days i treat it more like a signal - if people asks such puzzles a lot then they either intentionally filtering or even don't understand that and seriously think it is important CS(cience).

> and you assume all qualified candidates have been exposed to this

The point is that that's an absurd thing to assume. It would almost never come up organically (and has only really come up in the context of awful interview question blogs). It's not a taught/talked about algorithm.

Rote memorization is also a poor metric for job performance.

My CS degree had a duration of 5 years, and I graduated during the 90's.

Excuse me if I don't recall a complete year of Algorithms and Data Structures lectures and evaluation projects done about 25 years ago.

This is the tortoise and the hare (https://en.wikipedia.org/wiki/Cycle_detection)

Expecting a candidate to actually know this trick is however condoning the fact that the technical interview is just a matter of cheating and cramming.

Whether it is a good or a bad interview question depends on how you use it. I've used a variant of this when interviewing CS PhD candidates. I don't expect anyone to know how to solve this immediately. If anyone did, I'd assume they'd seen it before, and ask something else.

What I want is to see how someone thinks about algorithmic problems. As they talk through what they're thinking, there are certain to be problems or limitations with their first suggestion. I'll probe on these, and see whether they understand those issues as they're pointed out. I'll give hints and see how they assimilate new ideas and information. In the end, 75% of candidates will get to a reasonable solution, and 25% will get to an optimal one, given some hints along the way. I'm much less interested in whether they get to the optimal one, or even in how fast they get there, than in what happens during the discussion along the path to a solution. You learn a lot about how someone thinks from that. It's also useful to learn how someone communicates - this is also an essential part of a PhD - there's a lot of back-and-forth goes on regarding possible research ideas, so seeing how someone communicates their ideas is useful.

In summary, in an interview, such questions can be good as the basis of a dialog, but are useless unless the interviewer understands this is what they're actually trying to achieve.

You're not getting any insight into the candidate's problem solving ability in general, though. You may be getting insight into how they think about a specific class of algorithmic problems, and possibly (although it's unlikely) algorithmic problems in general.

What this interview question does is confirm your own biases regarding what "algorithmic thinking" is and little more.

Of course you're correct, and this is why such a question only forms a part of such an interview. I'm much more interested in what someone has built. We'll have a good discussion about any software they've built, and what they found interesting or difficult about that. Or basically anything where they showed creativity.

Still, in my area of CS, algorithmic thinking is an important aspect of the sort of systems we design, so this sort of question does help me build up a picture of the person's skills and approach to problems.

I have found though that there's pretty good correlation between how someone goes about answering this question (the mental process they follow, not whether they can jump to a solution), and how interesting the systems they've previously built are. And, although my sample size is small, the people who did best on this question also did best over the next few years on the systems they built and analyzed during their PhD.

How does one, then, get an insight into the candidate's problem-solving ability in general?
By working with them on a number of non-trivial problems over the course of some time, usually measurable in weeks or months at least.
OK, I wholeheartedly agree with that. I was thinking, though, that discussing a problem such as the one with the linked list at least would give one sample point of the candidate's ability to reason. In any case, what are you supposed to do if you are restricted to making your decision based on interviews?
I've used a variant of this when interviewing CS PhD candidates.

If in fact it's for an actual PhD qualifying exam (or something similar), then this question might be OK.

The problem with the current crisis in interviewing is that it's become almost standard practice to ask questions like this (or its siblings: knapsack, outré graph search or sorting questions, etc) for what are basically run-of-the-mill API monkey / grunt finance programming / etc jobs.

As if the message these companies intend to convey is: "Fuck, we have no idea how to assess these candidates, nor do we have the time. So if we just ask a few gee-whiz questions that 75% of them will fail, then that might be an indication that the other 25% just might be a little smarter. Or at least very good at cramming. Because after all, that's how we got through college, too."

It's essentially impossible to require candidates not to prepare for a technical interview, but someone who can't discover the tortoise and hare algorithm on their own probably shouldn't have a computer science degree.
> someone who can't discover the tortoise and hare algorithm on their own probably shouldn't have a computer science degree

This sentence is simply ridiculous. Most engineers aren't able to discover this trick. I wouldn't.

The same goes for the rod cutting problem or the maximum sub-array problem (Kadane algorithm): expecting an engineer to find them, especially in an interview, is totally inane.

Of course, nobody expects engineers and scientists to have the same skills.
Then why is tortoise and the hair credited to Floyd if it’s so trivial?
Attribution in science works in mysterious ways, and I don't claim to know how it works.

In any case, I didn't say “trivial”. If you can only derive trivial things, then you shouldn't be a computer scientist.

Are you suggesting computer scientists should be able to derive non trivial results on job interviews? Really?
This amount of nontrivial, yes. Similarly, I'd expect a computer scientist to come up with a proof of correctness for either binary search or merge sort (the out of place variant) in ~30 minutes. It's not particularly difficult.
You didn't say the answer has to use O(1) space, so I'd just throw the nodes in a WeakSet, and when I reach a node that's already in the WeakSet, that's the start of the loop. Or better yet, use a WeakMap to associate nodes with their position, so that once I find the start of the loop I can immediately subtract to get the loop length, no second pass needed.
This algorithm can also be used for integer factorisation: https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
I always wonder if odd(and perhaps pointless) interview Q like this have been contorted from the original(much useful) problem like pollard's rho or it just happened to reduce to more useful ones.

Digging deeper into such questions yields fun observations which may be the only useful thing that comes from questions like these. Like the length of the Longest Increasing Subsequence of a sequence A, has a fun fact to do with the Chromatic Number of a graph derived formed from a matching between A and sorted(A).

Theoretical Math is beautiful and the answer to so many things.

I recently received this as an interview question.

This is one of those tricks once you know it, you'll remember it forever.

Which makes it a fun thing to be aware of but a really terrible interview question. It's been kicking around forever as such, unfortunately.
I was asked this exact question during an interview for a senior scala dev position and nearly walked out. I asked the guy to show me how one would go about implementing a cycle with prepend only api and of course did not get an answer or the job
A proof that the algorithm detects the existence of a cycle.

If there is a cycle of length n then eventually both pointers get into the cycle. Since at each step the distance between the two pointers is increased by one module n then the distance becomes zero. Also the time to detect the loop is bounded by the time it takes both pointers to get into the loop plus the cycle length. Also this proof shows that the key ingredient is that the difference between the pointers must be coprime with n but still n steps are required to guarantee that the difference eventually becomes zero. So there is no advantage in choosing other values for the increment of the pointers except to achieve that the slowest get faster into the cycle.

Thank you, super.
if you program in Lisp the function list-length gives nil for circular lists. Ninety-nine problems contains similar problems with lists.

I should have said that the difference between increments must be coprime with n to guarantee that the differece becomes 0 module n, that is both pointers get equal.

> It is easy to see that they will eventually meet on the cycle.

Not easy for me, but thanks for the algorithm, maybe I will find time and energy to understand and believe. Why by two at a time and not by 3?

Consider a cycle of length n. Consider two integers, i and j that have arbitrary initial values in [0, n).

We want to prove that the following loop always terminates:

    while i % n != j % n:
        i += 1
        j += 2
If we represent this as an equation over time, where t is the number of loops, we want to prove that for some t the following is true:

    (i + t) % n = (j + 2t) % n
Solving for t:

    (i % n) + (t % n) = (j % n) + (2t % n)
    (i % n) - (j % n) = t % n
    (i - j) % n = t % n
We can generalize for arbitrary speeds of the pointers. Let's say i moves at speed x and j moves at speed y:

    (i + x*t) % n = (j + y*t) % n
    (i % n) + (x*t) % n = (j % n) + (y*t % n)
    (i - j) % n = t*(y - x) % n
This will always have a solution if (y-x) is coprime with n.

A trivial case where it doesn't work, for example, is x=2, y=4, n=any even number. Imagine if i starts on an odd number and j starts on an even number. It's clear that i will always be odd and j will always be even. Note that y-x (2 in this case) shares a prime factor (not coprime) with all even numbers.

Thank you very much. I like the equation over time.

Your explanation is difficult to follow when you do distributivity of % over +. I mean when you move from

    (i + t) % n = (j + 2t) % n
to

    (i % n) + (t % n) = (j % n) + (2t % n)
Here I read the '=' as normal number equality. A simplified example to demostrate that distributivity is not valid here:

    (5 + x) % 3 = 4
    5 % 3 + x % 3 = 4
In the second line x=5 is a solution. But in the first line x is not a solution.

It's better to express your explanation using "= (mod n)" from the beginning.

    (i + t) % n = (j + 2t) % n 
    i + t = j + 2t (mod n)
Note that if you add the requirement that i and j start on the same node, then no matter what step size you use for x and y, as long as they are integers > 0 and x != y, then i and j will meet again if and only if there is a cycle.

For particular values of x, y, and loop period there may be values of i and j such that going forward from those they never meet, but it is not possible to get into one of those positions if a past state had i == j.

Proof:

In the following I'll use your notation for the most part, except allowing for a "lead in" of length L from that common starting node to the start of the loop, so the nodes are numbered 0, 1, 2, ..., L-1, L, L+1, ..., L+n-1, with the nodes in the loop being {L, L+1, L+2, ..., L+n-1}.

At time t, i has taken xt steps and j has taken yt steps. Let t >= L/x and t >= L/y, so that i and j are both past the lead in into the loop.

Then i is xt-L steps into the loop, and j is yt-L into the loop. These will be at the same spot in the loop if xt-L = yt-L mod n, which happens whenever (x-y)t = 0 mod n. All multiples of n are solutions of this, so just pick any that are >= L/x and L/y, and you have a solution.

If i and j do not start at the same place, all bets are off. That 0 is replaced with the differences between the starting node numbers, and as your example shows may not have a solution for some combinations of parameters.

Hi! Just a question, since I've never been good with proofs. Why use the modulo operator in the while-loop? Couldn't we just test if i != j?
The modulo represents the cycle.

Maybe it would be helpful to rewrite the loop with a condition as you described, and %= i and j with n after incrementing (by 1 or by 2) both variables in the loop body.

Intuitive explanation: Once the pointers enter the cycle the fast one will eventually approach the slow one from behind like runners on a stadium track. If it's one node behind the slow one, then they'll meet on the next move (slow advances one, fast advances two). And if the fast one is two nodes behind, on the next move it'll be one node behind.
3 will also work. People pick 2 intuitively, and it minimizes the overall runtime of the algorithm.

(google "Proof of Floyd's Cycle Chasing" for details)

Another advantage of 2 is that it works even if the two pointers do not start on the same element.
It's not specifically 2, any two speeds that are coprime will have the same property.
Counterexample: loop of 8 nodes, numbered 0, 1, ..., 7.

Start with one pointer at node 0, moving with speed 5. Start the second pointer at node 1, with speed 3 which is coprime to 5, the speed of the first pointer.

The pointer positions at each step are:

  0 5 2 7 4 1 6 3 repeat
  1 4 7 2 5 0 3 6 repeat
The condition we need in order to guarantee it works regardless of where the two pointers start is that the difference between the speeds is coprime to the cycle length.
Off the top of my head, because if you don’t know where the last element points back to, and there’s an even chance it could be any previous element, the median element is the one on the middle. Two at a time means when you reach the last element and loop back. The lagging pointer will point to the middle element so. Across a population on lists, it will perform better that ratios that favour the lagging pointer being closer to ge back end of the list when the leading pointer loops.

Also, with a 3 skip you could jump 2 ahead of the lagging pointer when you loop back behind and jump over it, which means when it advances it won’t catch you so they won’t meet.

"It is easy to see that they will eventually meet on the cycle. Once that happens, it is easy to count the cycle length."

Is that so?

The second part of this confused me as well, because the algorithm only detects the existence of the cycle, not its length. However, once you know a node in the cycle (which this algorithm will provide), you simply go around the cycle once more and you have the length.
More specifically you stop moving the hare and send the tortoise around again until it meets the hare counting your steps along the way.