| "I think most interview questions are similar nonsense." It's probably a good idea to be careful with your words when you admit that you don't know the answer to a question. First off: your example (random permutations of the same four words) doesn't require much memory at all. So if you think it does, you're wrong. You might overflow your counters, but that's a different problem. A stream of random gibberish is certainly more challenging. But the cardinality of the English language isn't infinite (the OED has about 230k words, and that's with lot of words that nobody ever uses), so even a naive solution doesn't require "unbounded memory", as long as you take the problem statement seriously and don't do something ridiculous. That would be good enough to pass an interview. But OK, let's say you do have a stream of random latin-encoded gibberish. What then? The problem statement is that you have to determine the top-10 words (or in this case "tokens") by frequency. The cardinality of the set is infinite, but the probability of duplication per token is small, and the output set is tiny. Do you really think you need unbounded storage? In any case, even if you think a problem is "nonsense", it's probably true that the interviewer has thought about it more than you have. The part that frustrates you is highly likely to be the bit worth probing. A bad candidate will bomb out immediately; a decent candidate will provide a solid, if not perfect solution; a great candidate will solve the problem, see the broader theoretical aspects, and investigate those as well. |
This is an example of a trap that interviewers run into when they try to arbitrarily reword questions.
While I could be mistaken, an exact solution to the top-k problem requires O(N) space where N is the number of distinct items. I can trivially think of a stream of tokens that would defeat any reasonable computer in both available memory and general "storage" (ex: 1 quintillion distinct words, then the next 10 words are a duplicate of an existing word, then end of stream). Since you asked for an algorithm and not a heuristic, it's clear you are looking for an exact answer. So the answer is "Yes, I would need unbounded storage for the new question as asked". Since this is an interview, I'd expect that'd you'd want me to give you the technically correct, and accurate answer.
I've tripped up enough interviewers who have tried to slightly reword questions, but ended up changing their meaning.