Hacker News new | ask | show | jobs
by timr 3464 days ago
"You need to ask yourself why you believe the "average coder" should be able to solve that because clearly your beliefs are not founded in reality."

Oh, stop.

The problem he describes is trivial, and something that you'll encounter as an entry-level web developer on a regular basis. If you can't solve it, you're absolutely not up to the job. In fact, I'll go further: if you literally cannot find an efficient way to filter a list of stuff based on a criteria, you're not even a programmer yet. It doesn't matter if you've "written" a dozen toy webapps by stringing together NPM modules -- not knowing these basic things makes you a danger to any team that hires you.

You can't judge the quality of a test exclusively by the number of people who fail it. If you resume screen for "has written code before" and 80% of your applicants fail that test, is your standard set too high?

(In case you're wondering, that's not a hypothetical example.)

2 comments

> If you resume screen for "has written code before"

I wish that was the resume screen criteria at all the places that ignored my applications.

The more I read these threads the more I think resume filtering is part of the problem. It makes some sense; if the bad applicants have to apply to hundreds of jobs to get hired they have likely learned how to game the resume scteen.

Resume filtering is terrible - I tend not to be involved in the process since usually recruiters or hiring managers do this. However, when someone stops by my desk and asks "should we schedule a phone screen with this candidate who lists HTML, CSS, and jQuery, as programming languages" for a senior web developer position, I say no.

A huge problem is people list technologies as keywords on their resumes and rarely indicate they know what those words even mean. This makes screening next to impossible at scale (think 1000's of resumes coming in for a single position).

I know this isn't always the case, but how does one note competency in a given language or with a given tool(set) while fitting a resume on a single sheet. Especially if it's for a senior/lead or higher position?

There's a lot that's broken with the hiring process. One of the toughest thing as a potential candidate is how to properly tailor a resume to fit the bill without "gaming" the hiring process. There is no standard - only methods that work better than others in most situations.

My personal approach is to present myself as an individual with a large amount of skill and experience working in a specific programming discipline (e.g. web applications). I present a short list of the core tech I have experience with, ones I'm comfortable answering interview questions in. I do not list every tech I've used - that list would be half a page long and do no one any good.

I describe the projects I've worked on, what the problems challenges were, and how my work helped solve them. The format I use tends to work itself out as a simple narrative outline, and I tailor every resume to be the most relevant to the position I'm applying to, and I submit a cover letter specific to the job. This isn't "gaming" the process, this is doing your homework on the company/position and selling yourself as a viable candidate. If I'm weak in a desired skill, I call it out in the letter and demonstrate my past experience as an example of how I can learn new technologies quickly (something vital for any programmer, IMHO).

That said, after a certain level of experience, relying on your resume to get you in the door isn't going to get you the job you want in most cases. You want to meet people directly and apply via recommendations or requests. Direct contact with peers working at the company you want to apply to matters a lot. Even communication with a recruiter is better then blind submitting to a job post.

Is he talking about this problem (with a filter at the end)?

http://www.geeksforgeeks.org/finding-all-subsets-of-a-given-...

Because I have never done that. That's different than just running through a list and picking out items that meet a criteria.

Not that I can see. The OP said:

"find the subset in a given collection that matches this specific criteria"

So basically, a loop through a single table. That's as simple as it gets. You can make the problem more complicated, of course (e.g. "write a method to find the minimum and maximum ages of the male users"), but it's still pretty simple stuff.

A slightly less trivial "algorithm" question that should be equally easy for any decent programmer: I give you a document of english words. Write a function that counts the words, and returns the top ten words seen, by frequency. Now...solve the same problem when it's not a single document, but a stream of words of unspecified length. Don't run out of memory.

Depending on the wording it looks like his go to intro problem is actually easier than Fizz-Buzz as it is just filtering. For example say it was "Given a set return a subset where the values are less than 10." (such as set.Where(i => i < 10))
I'm hard pressed to imagine how this couldn't be done efficiently. Do you have an example which correctly solves the problem but takes too long?

The only thing I can think of is something like the output rewriting an array on update over and over.

Do you have an example which correctly solves the problem but takes too long?

Behold

  var AGESORT = rows.bubblesort("AGE ASCENDING")
  var AGESORT1 = rows.bubblesort("AGE DESCENDING")
  var MALE = "None"
  var MALE1 = "None"

  for row in AGESORT
    if row.isMale
       MALE = row
       break

  for row in AGESORT1
    if row.isMale
      MALE1 = row
      break
    
  
  print "Min age: " + MALE
  print "Max age: " + MALE1
Stuff like this is usually the result of thoughts like "Ok, I'll break this problem into two, then combine the pieces. First I'll find the minimum age, which is easy once I sort the rows. Done. Ok, now I can see I can slightly alter that code so that it finds the max age! Now, I just put the pieces of code together, and do a little code organization to put like with like. I"
Good point.
Which problem? The first one? Don't underestimate the badness of the average interviewee.

For the "find the min and max of a set" in particular, a lot of folks start out with terrible solutions.

Well either I suppose. For the subset problem as you say a filter critera can be applied while looping through the collection. And that should be it. Now if the question was multiple criteria I could see the solution diverging rapidly. If the filter can be expressed as a rank we could sort the collection first and then binary search through it to find the boundaries. I suppose I'm just trying to imagine some way an interviewee may turn that into two loops, or three loops... more?
I would like to see a solution to your problem with bounded memory.

In particular, the case where I want the top 3 words, you don't know the length of the stream, and you get random permutations of the same 4 words until I stop emitting them (where I will end by emitting 3 to break the tie).

That's not to say your problem isn't interesting -- just that while specifically constructing a problem as an example, you created one that's unbounded in required memory (in both storage per value and potentially number of values to store) and then demanded a solution that doesn't run out of memory.

I think most interview questions are similar nonsense.

"I think most interview questions are similar nonsense."

It's probably a good idea to be careful with your words when you admit that you don't know the answer to a question.

First off: your example (random permutations of the same four words) doesn't require much memory at all. So if you think it does, you're wrong. You might overflow your counters, but that's a different problem.

A stream of random gibberish is certainly more challenging. But the cardinality of the English language isn't infinite (the OED has about 230k words, and that's with lot of words that nobody ever uses), so even a naive solution doesn't require "unbounded memory", as long as you take the problem statement seriously and don't do something ridiculous. That would be good enough to pass an interview.

But OK, let's say you do have a stream of random latin-encoded gibberish. What then? The problem statement is that you have to determine the top-10 words (or in this case "tokens") by frequency. The cardinality of the set is infinite, but the probability of duplication per token is small, and the output set is tiny. Do you really think you need unbounded storage?

In any case, even if you think a problem is "nonsense", it's probably true that the interviewer has thought about it more than you have. The part that frustrates you is highly likely to be the bit worth probing. A bad candidate will bomb out immediately; a decent candidate will provide a solid, if not perfect solution; a great candidate will solve the problem, see the broader theoretical aspects, and investigate those as well.

Do you really think you need unbounded storage?

This is an example of a trap that interviewers run into when they try to arbitrarily reword questions.

While I could be mistaken, an exact solution to the top-k problem requires O(N) space where N is the number of distinct items. I can trivially think of a stream of tokens that would defeat any reasonable computer in both available memory and general "storage" (ex: 1 quintillion distinct words, then the next 10 words are a duplicate of an existing word, then end of stream). Since you asked for an algorithm and not a heuristic, it's clear you are looking for an exact answer. So the answer is "Yes, I would need unbounded storage for the new question as asked". Since this is an interview, I'd expect that'd you'd want me to give you the technically correct, and accurate answer.

I've tripped up enough interviewers who have tried to slightly reword questions, but ended up changing their meaning.

Well, "algorithm" is just a way of saying "procedure", so a heuristic applies (IMO). I would accept a reasonable heuristic without much argument because the problem of random text is much harder...but honestly, if you got to this part, you've already passed the interview. A total flameout on this question is not even realizing that the storage scales with vocabulary size, instead of sequence length.

An exact solution to an unbounded sequence of purely random text probably does require unbounded memory (I say 'probably' only to hedge my bets here). But depending on the definition of "random", you can put pretty tight bounds on it and only be off by a little. I haven't done the math, but my gut says that a bloom filter, followed by incrementing counts only for positive hits from the filter, would scale well. There may be simpler approaches that make use of word length and the size of the latin alphabet (e.g. "all tokens of size N have 1/C(26,N) probability of colliding if letters are chosen from a uniform distribution, therefore...")

But again, if we actually had this conversation in an interview, there'd be no danger of not passing. Unless you were a jerk or something.

If your algorithm overflows counters and produces incorrect output as a result, then you don't have a correct solution.

There's no way to solve the unbounded stream case correctly with finite counters as you admit.

So you didn't actually present a bounded memory solution to the problem.

The number of bytes in a machine number is not the limiting factor here. You could use arbitrary precision numbers (with some constant factor of additional storage), and solve the precision problem.

Being pedantic about this point doesn't get you closer to a correct answer. Read dsp1234's responses. They are correct.

In the given case, it's actually the "English words" limitation that saves the question. An exact answer to top-k problems is O(N) is storage space where N is the number of distinct items. So since N is ~200k, it's not a terrible problem to deal with. When N is 3 as in your given case, it's not really a problem. The general case, of when the stream is unbounded in the number of distinct items, is much harder, say the stream is 1 quintillion non-repeating words, then starts repeating. This requires an approximation algorithm, but is not likely the solution that was looked for since the person specifically said "slight less trivial", and the streaming approximation algorithm is more complex.

Though it could be still be saved by flipping the question back onto the interviewer's phrasing. They said "don't run out of memory", not "don't run out of disk space". So you could also solve this problem by, for example, writing the data into a database, then using SQL group by and top X functions to solve it once the stream has ended. But as an interviewer, I probably wouldn't be amused.

Hah! Hell, I'd even give credit to the SQL answer, if it came with the rest. ;-)

The only thing I'd add is that you can probably come up with a pretty simple heuristic or three to solve the problem for the case of truly random gibberish, without the need for elaborate data structures. But yes, this would be the answer of a great candidate.

You can't have finite counters per word, else you can't disambiguate between counter length of a single word then the pattern I described and just the pattern I described, despite the fact it impacts the answer.

Your solution is incorrect because it fails to handle arbitrary length streams as specified -- as do the other glib answers.

My point was the specification didn't seem to align with the intended problem, and that interviewers tend to make such mistakes on the description, and then get mad when you ask them to clarify if we're talking practice or theory.

Similarly, you don't account for arbitrary proper nouns, which could theoretically be unboundedly introduced to an arbitrary sized English corpus. I guess you can debate which of these are "English" (itself a proper noun!), but interviewers often treat you as dumb if you don't divine their intent in such underspecified situations.

You can't have finite counters per word

You only need one counter per word.

a quick sketch of the basic algorithm is:

  1.) Create a new list of type <string, number> into COUNTS
  2.) Read a word from the stream into WORD
  3.) if WORD exists in COUNTS, then increment the number for that entry
  4.) if WORD does not exist in COUNTS, then add WORD to COUNTS with a number of 1
  5.) if not end of stream, goto 2
  6.) traverse through COUNTS keeping a running list of the top 10 highest entries (Note that this and step 4 can often be combined to keep a running total of the top 10) into TOPCOUNTS
  7.) print TOPCOUNTS
This algorithm is O(N) space where N is the number of distinct items, as I mentioned previously.
That's more advanced then the question I ask typically. Though I might use something like this if I'm interviewing someone with many years of experience (10+).

As I explained above, my question is a warm up - meant to break the ice, calm nerves. But it's surprising to me how effective even a simple list traversal is at identifying weaknesses in a candidates programming ability.

Depending on the platform, there's at least 2 ways to do that :-)

E.g. - "functional" is simply something like:

    subset = filter( predicate, superset )
... but on a "procedural" platform, it's a bit more challenging, starting with how much space to allocate for the result, or whether to mutate the input, and other wading through the swamp.

But, yeah, I can see how it would get people to start up talking somewhere, or else go into "deer in the headlights" mode.

Just out of curiosity: how would you feel if I answered this question by using existing function [1] in stdlib? Would you consider that as (good) sign of knowing the tools or would you prefer a fresh implementation?

[1] https://docs.python.org/2/library/itertools.html#itertools.c...

I see it as a positive if a candidate uses an existing solution to the problem. That's exactly what a good engineer would do in the real world.

But then I take away the library/method and re-state the problem. Because the point is to see if they can understand the problem, and devise a solution.

Cool. Thanks for your comments.
Heh, at my previous employer the only coding question we asked during interviews was: "Find the largest element in an array of integers." Some of the ways people found to not solve that simple problem were amazing.

Now, I'm generally against asking hard-core CS questions in a live interview... but a simple bozo filter is probably a good idea.

It really does surprise me how a simple programming questions like this does stump the average interviewee. If they need a bit of guidance to get started, I never hold it against the candidate, but if they need a massive hand holding to come to a solution, or if they can't explain how their solution works and how it could be improved, then I know where they stand programming wise.

It is a low bar skills wise, and it tells a lot within a very short amount of time.