| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SomeStupidPoint 3464 days ago

You can't have finite counters per word, else you can't disambiguate between counter length of a single word then the pattern I described and just the pattern I described, despite the fact it impacts the answer.

Your solution is incorrect because it fails to handle arbitrary length streams as specified -- as do the other glib answers.

My point was the specification didn't seem to align with the intended problem, and that interviewers tend to make such mistakes on the description, and then get mad when you ask them to clarify if we're talking practice or theory.

Similarly, you don't account for arbitrary proper nouns, which could theoretically be unboundedly introduced to an arbitrary sized English corpus. I guess you can debate which of these are "English" (itself a proper noun!), but interviewers often treat you as dumb if you don't divine their intent in such underspecified situations.

1 comments

dsp1234 3464 days ago

You can't have finite counters per word

You only need one counter per word.

a quick sketch of the basic algorithm is:

  1.) Create a new list of type <string, number> into COUNTS
  2.) Read a word from the stream into WORD
  3.) if WORD exists in COUNTS, then increment the number for that entry
  4.) if WORD does not exist in COUNTS, then add WORD to COUNTS with a number of 1
  5.) if not end of stream, goto 2
  6.) traverse through COUNTS keeping a running list of the top 10 highest entries (Note that this and step 4 can often be combined to keep a running total of the top 10) into TOPCOUNTS
  7.) print TOPCOUNTS

This algorithm is O(N) space where N is the number of distinct items, as I mentioned previously.

link

SomeStupidPoint 3464 days ago

Again, you're assuming count has a bound size per word, which isn't true for an unbounded stream.

Step 3 of your algorithm either requires unboundedness of count (ie, count can use arbitrary amounts of memory) or can overflow on arbitrary length streams of words (and hence, has cases where it produces the wrong output).

link

AmitJS 3463 days ago

Hash tables FTW!

link