Hacker News new | ask | show | jobs
by pfdietz 700 days ago
You don't need to load trillions of numbers into memory, you just need to count how many of each number there are. This requires 2^32 words of memory, not trillions of words. After doing that just scan down the array of counts, summing, until you find the midpoint.
1 comments

Yeah, I thought of that actually, but the interviewer said “very little memory” at one point which gave me the impression that perhaps I only had some registers available to work with. Was this an algorithm for an embedded system?

The whole problem was kind of miscommunicated, because the interviewer showed up 10 minutes late, picked a problem from a list, and the requirements for the problem were only revealed when I started going a direction the interviewer wasn’t looking for (“Oh, the file is actually read-only.” “Oh, each number in the file is an integer, not a float.”)

That “miscommunication” you mention has been used against me in several interviews, because I was expected to ask questions (and sometimes a specific question they had in mind) before making assumptions. Well, then the 30min becomes an exercise in requirements gathering and not algorithmic implementation.
Which in fairness, is a reasonable competency to test for in an interview
Indeed. But I need clarity on which skill they want to test in thirty minutes.
Speaking as an interviewer: nope, you're not being tested on a single skill.
See, that's what the multi-round, two-hour interview blocks are for. Each interview tests a different set of skills.

If you're testing on algorithm implementation and requirements gathering in thirty minutes, you're not testing for the skills you claim to be testing for. There's no way you're getting a good (let alone accurate) picture of the candidate's ability to gather requirements and implement those requirements, especially if your selection tactic is to deny them because they didn't get the PhD answer.

You're testing for how good of a minion this candidate will be.

With 256 counters, you could use the same approach with four passes: pass i bins the numbers by byte i (0 = most sig., 3 = least sig.) and then identifies the bin that contains the median.

I really want to know what a one-pass, low-memory solution looks like, lol.

Perhaps the interviewer was looking for an argument that no such solution can exist? The counterargument would look like this: divide the N numbers into two halves, each N/2 numbers. Now, suppose you don't have enough memory to represent the first N/2 numbers (ignoring changes in ordering); in that case, two different bags of numbers will have the same representation in memory. One can now construct a second half of numbers for which the algorithm will get the wrong answer for at least one of the two colliding cases.

This is assuming a deterministic algorithm; maybe a random algorithm could work with high probability and less memory?