Hacker News new | ask | show | jobs
by djsumdog 2920 days ago
I knew CS graduates who loved parsing theory and could implement some basic C compilers, and I knew some who didn't know what a SCSI controller was. You get out of your education what you put into it.

I had someone in an interview in Berlin ask me to write a garbage collector. In CSC 202, our professor talked about using reference counts. Reference counts can get leaks if you have two objects referencing each other with to path to either, but what I didn't realize is that Java hasn't used reference counts in a really long time. It does a search from each root (typically a thread) down the object tree; and it breaks things into young/old (eden space and .. something else) so long lived objects don't get searched as often.

I learned all of that during the interview lunch break when I looked it up on my phone. One of my good friends wrote a compile time GC for GO and did this PhD dissertation on it, and he probably would have got this question right. But if it's not in your field, well the problem space for problems is pretty fucking large.

Cycle detection? Man I could probably tell you back when I studied minimum spanning trees and wrote this thing to implement Dijkstra's shortest path:

https://penguindreams.org/projects/graphmapper/

Off the top of my head, I'd hope each node had a unique identifier and I guess I'd mark them/store the keys in a hash table. I'd move breath first and error out if I discovered the same hash/unique key .. which of course would give me a hash the size of the tree. Unless there's a way to mark the data structure, you now have a second structure the size of your key space.

I'm sure there are better solutions, but I wouldn't expect a senior program to know them off the top of their head, unless you're hiring really specifically for a position writing routing algorithms or looking for a senior airport transit architect.

3 comments

When people say they expect someone to implement cycle detection, they usually mean Floyd's algorithm.

https://en.wikipedia.org/wiki/Cycle_detection#Floyd%27s_Tort...

If you come up with something else (like tagging nodes), you get a strike for inefficiency.

It looks simple enough to feel smug about once you know it, at the same time there's near zero chance the interviewers could come up with this algorithm independently without prior knowledge.

When people say they expect someone to implement cycle detection, they usually mean Floyd's algorithm.

Not necessarily. Sometimes the dataset only has on the order of 10k nodes, and you just want to warn users if they create a cycle, or keep particular routines from going into an infinite loop.

If you come up with something else (like tagging nodes), you get a strike for inefficiency.

A few days ago, I implemented cycle detection in an event notification system where the graph size is relatively small in just 4 lines of code, which should be immediately understandable by any competent programmer. That you should mandate Floyd's algorithm in all cases gives you a strike for pedantic design without regard to cost/benefit.

I'm not sure what are you trying to argue. I never ask obscure CS trivia, you misread my post entirely.
I just set a "visited" bit in the data structure while the code walks it. There's a cycle if the bit is already set. It's the same algorithm as yours, but a lot cheaper.
Perhaps cheaper in space, but you will have to do an initial pass over the entire structure to zero your visited bits first, so it may not be cheaper in time.
The GC already needs to sweep through the whole allocated memory to find the unreacheble objects to be freed. Flipping a bit while doing this isnt a big deal.
Just ensure the bits are always zero on allocation, use DFS, and ensure you re-zero the bits on the way out.
Since the first pass quit when it finds the first marked node, the zeroing can traverse the structure, erasing marked bits, and stopping at the first 0. The entire structure doesn't have to be visited.
The entire structure doesn't have to be visited.

We are both saying this.

You have to walk the hash table again to free it.
I had someone in an interview in Berlin ask me to write a garbage collector. In CSC 202, our professor talked about using reference counts.

Jeez. I think my profs covered ref counting as an introduction, then also went on to cover mark/sweep and generational collectors. We also covered compaction, copying and bump allocation. I don't think it took that long. If I were running a shop that focused on, say Java, I would want to know if my new hires knew background information relevant to optimizing code running on the JVM.

Cycle detection? Man I could probably tell you back when I studied minimum spanning trees and wrote this thing to implement Dijkstra's shortest path:

Someone should have given you breadth first search and depth first search, then ran you through how those are components to other algorithms. You should have been left with those as a "toolbox" such that you automatically spend a second thinking, "what would happen to that graph if I did DFS or BFS on it." That kind of toolbox is powerful and gives you all sorts of useful insights. You are not the only one around here to say, "Man I could probably tell you back when..." What you should be thinking now is that you were not well served by some of your teachers. Fortunately, this sort of thing is easily rectifiable.

Off the top of my head, I'd hope each node had a unique identifier and I guess I'd mark them/store the keys in a hash table.

Well good on you. You just did better than most of my last 6 interviewees.

Unless there's a way to mark the data structure, you now have a second structure the size of your key space.

This is the DFS/BFS part right here. Is the 2nd part of your statement likely to be true, and how often would it be true? Good call on the marking. (EDIT: Just thought of it: Bloom filter.)

I'm sure there are better solutions, but I wouldn't expect a senior program to know them off the top of their head

Cripes! Freshmen used to be able to do this stuff!

unless you're hiring really specifically for a position writing routing algorithms or looking for a senior airport transit architect.

How about you don't want programmers who will end up debugingg an infinite loop induced by a data cycle every single time?

EDIT: I actually just wrote cycle detection for my side project in golang the day before yesterday. It took me 4 additional lines of code. If you have a shop that uses gob or some kind of object serialization, this may well be very relevant!

I had someone ask me the cycle detection question once, and I didn't care for how they phrased it. Specifically, should I find it immediately upon entering the first cycle (at a higher memory/time cost) or should I just eventually detect it (e.g. turtle/rabbit)? It was on me to clarify, but as a newb to the industry, I felt like I should just know which was expected.

And either way, if I were to ask this question, I would spend a lot of time helping the person along the way to make sure they were able to make the logical leaps that made sense to them, not to me.

Here's how I asked the question. I would present the data for a 2 node cycle and ask what would happen to this routine. Here's a response I would get far too often: A conditional clause detecting a 2 node cycle. Then I would present a 3 node cycle, then ask how to detect an n node cycle, period.

Lots of 3.5+ GPA grads can't make that logical leap!

You're trying really hard to find a connection between implementing a GC and debugging memory leaks... and failing.

The whole freaking point of a GC like say Java's is that an average programmer can use it without having to understand how exactly it's implemented. Of course it won't hurt to know that, but it's not at all mandatory knowledge.

One just has to know which situations the GC can't cope with and avoid them. For Java there's at least one open source dedicated tool for finding leaks, it nicely explains what one needs to know.

One just has to know which situations the GC can't cope with and avoid them.

Unfortunately, many programmers believe that since Java uses garbage collection, you do not have to think about GC and ownership at all.

Oracle had to replace the fast implementation of substring that just returns a slice of a String (O(1) time) by a copying implementation (O(n)), because too many programmers do not know the basics of ownership/garbage collection and would accidentally hold on to larger strings.

Seeing the implementation details of reference counting, mark-sweep collection, and perhaps a generational collector once, makes you more aware of memory and ownership issues, even if you forget the nitty gritty details later.

You're trying really hard to find a connection between implementing a GC and debugging memory leaks... and failing.

I spent years at a vendor for a Virtual Machine. That you would compose such a sentence shows that you are ignorant of some aspects of optimization. You don't even know what you don't know, and projected that ignorance on another.

The whole freaking point of a GC like say Java's is that an average programmer can use it without having to understand how exactly it's implemented.

One of my company's most frequent consulting tasks was helping clients optimize to maximize throughput for the generational GC. That you jumped to the conclusion that I was talking about memory leaks is pretty damning.