I love it. You spark an (imho) much more interesting debate: that of raw smartness vs pragmatic productivity.
A colleague of mine (now hired) was asked to code a string compare during her interview for a .NET function. She said "String.Compare()". This puzzled the interviewers for a while. They asked her whether she could write it out, she said she didn't see the point.
They ended up hiring her for other reasons (notably, references from trusted colleagues who had worked with her), but I still wonder whether that attitude worked for her or against her.
There's something to be said for NOT reinventing the wheel. Standard libraries are standard for a reason - the "String.Compare()" function is likely faster for just about every case, in addition to being already there.
I keep seeing this argument on threads about interview questions.
The principle of not reinventing the wheel has nothing to do with interview questions. Sure, if you're actually working on solving an actual problem at your job, then most of the time you'll be better off using a standard library function instead of rolling your own.
In the context of the job interview, however, it doesn't matter if the solution to the problem is well-known or whether it can be found in a standard library. The problem of the interviewer is not to find out how to compare strings, it's determining if the candidate will be able to write proper code if hired. And the string-compare question might be a good starting point for evaluating the candidate's level. In any case, the fact that a solution is in a standard library is irrelevant.
If I was the interviewer I'd create a new data structure Foo and then ask the candidate to implement Foo.compare(). The question remains essentially the same, so I wonder what the candidate would reply.
> The problem of the interviewer is not to find out how to compare strings, it's determining if the candidate will be able to write proper code if hired.
The trouble with the question is that proper code to compare strings is almost certainly going to be a call to some existing library function. There are only rare cases (e.g. when you're one of the 50 people in the world who implement libc) that it makes sense to write it yourself. It's not clear from "code a string compare" exactly which set of wheels the interviewer wants reinvented: if strcmp is out of bounds, can you use strlen and memcmp? Because if strcmp was somehow buggy, that might be a reasonable thing to do. If the problem is that strcmp is too slow, should we maybe drop to assembly? Or change to a counted-string representation to avoid byte-by-byte operations? Or calculate hashes when strings are mutated, or intern them?
That's my point. At work, I'd prefer someone using String.Compare() over some nasty hand-crafted for-loop with switches and things for all kinds of collation issues. Why would I ask something else during the interview?
I love the GGP's solution for the same reason. If the regex is compiled only once, it may be only marginally slower (if at all), and it significantly improves readability and maintainability, and tremendously reduces the chances for bugs.
I'd hire the him. I have the impression most Google-interviewers (and similar dudes/dudettes) wouldn't. I wonder why.
>I have the impression most Google-interviewers (and similar dudes/dudettes) wouldn't. I wonder why
Perhaps that was rhetorical, but I'll answer anyway:
Being able to wire up existing libraries to accomplish a goal is a pretty low bar to set as far as proficiency goes. Google doesn't want code monkeys. The solution above is perfectly good from a software engineering perspective, but it doesn't show the depth of the candidates knowledge nor how strong their grasp of CS techniques is.
Google's interviews are more like IQ tests than software engineering tests, using CS as the measuring tool. When you're Google you can afford to be that selective.
I'm glad that you like my solution, but please note that, as another commenter said, it's a buggier version of the original regex I thought of ("^(dict|here)+$"), which I think should work but doesn't, at least not in Python.
I suspect it's because the match group is being replaced with the last match rather than added as another group, but it will work as a state machine, and is pretty much equivalent to the backtracking example in the article (although with much less code, and no memoization).
That said, I think that the reason interviewers ask about functions for which we have well-known implementations is to see whether or not you know how they work and/or could implement them yourself. Nobody will reasonably expect you to implement your own string comparison routine, but you could score points if, for example, you said Boyer-Moore for string searching rather than the naive iterative version.
The correct answer would be ['aa', 'ab'] but unfortunately findall works greedily and so will not find the optimal solution. It is possible to specify it as a regex, but common implementations might take too much time to come up with the good solution.
The mistake arises from the different semantics of findall versus what I initially wanted to do. I initially tried "^(dict|here)+$", which should return all the subwords, but doesn't (it only returns the last one).
I'm missing something, clearly, but my initial idea was a state machine that contained all the words (sorted by length, perhaps, but that's not necessary if we don't care about the solution with the "most words") and linked back to itself. It's essentially the same as the backtracking example, from a cursory glance.
Of course, findall is a dumb version that only finds disjoint substrings.
EDIT: It turns out that the group count in regexps is fixed, so it's not possible to return all matches of a single group, even if it's repeated. All my solution above does is show you whether there's a match or not, but not what it is (unless it's the single word).
Interestingly, the article explicitly states that our dictionary supports only the exact string lookup command, dict.contains(string). Strictly speaking, the full content of the dictionary isn't available to us, and we can't create the regular expression.
There's a library for constructing a deterministic regular expression from a dictionary, by the way, which would right away give you the exponential-time result. If you wrapped it in a * and applied it with a guaranteed-linear-time regular-expression engine like RE2, you'd find out whether the string was segmentable (and as a bonus you wouldn't have to construct the deterministic RE yourself) but I don't know if you'd get the actual segments.
rfumani gave one reason: it won't produce the right answer. Here's another: your regex engine will probably fall over dead if you ask it apply a million-word alternation.
A colleague of mine (now hired) was asked to code a string compare during her interview for a .NET function. She said "String.Compare()". This puzzled the interviewers for a while. They asked her whether she could write it out, she said she didn't see the point.
They ended up hiring her for other reasons (notably, references from trusted colleagues who had worked with her), but I still wonder whether that attitude worked for her or against her.