Hacker News new | ask | show | jobs
by SomeStupidPoint 3464 days ago
You can't have finite counters per word, else you can't disambiguate between counter length of a single word then the pattern I described and just the pattern I described, despite the fact it impacts the answer.

Your solution is incorrect because it fails to handle arbitrary length streams as specified -- as do the other glib answers.

My point was the specification didn't seem to align with the intended problem, and that interviewers tend to make such mistakes on the description, and then get mad when you ask them to clarify if we're talking practice or theory.

Similarly, you don't account for arbitrary proper nouns, which could theoretically be unboundedly introduced to an arbitrary sized English corpus. I guess you can debate which of these are "English" (itself a proper noun!), but interviewers often treat you as dumb if you don't divine their intent in such underspecified situations.

1 comments

You can't have finite counters per word

You only need one counter per word.

a quick sketch of the basic algorithm is:

  1.) Create a new list of type <string, number> into COUNTS
  2.) Read a word from the stream into WORD
  3.) if WORD exists in COUNTS, then increment the number for that entry
  4.) if WORD does not exist in COUNTS, then add WORD to COUNTS with a number of 1
  5.) if not end of stream, goto 2
  6.) traverse through COUNTS keeping a running list of the top 10 highest entries (Note that this and step 4 can often be combined to keep a running total of the top 10) into TOPCOUNTS
  7.) print TOPCOUNTS
This algorithm is O(N) space where N is the number of distinct items, as I mentioned previously.
Again, you're assuming count has a bound size per word, which isn't true for an unbounded stream.

Step 3 of your algorithm either requires unboundedness of count (ie, count can use arbitrary amounts of memory) or can overflow on arbitrary length streams of words (and hence, has cases where it produces the wrong output).

Hash tables FTW!